I have a website full of three-columned tables that I would like to 'strip' of their outer columns, leaving just the centre one (to be later transformed into a variable for insertion into a php template) - but I can's seem to come up with a working regex that will select only tables with three columns - as there are sometimes several tables in a page - all I could come up with selects 'across' tables. I've tried negative/positive lookahead, everything.
The biggest problem with this sort of search/replace is the line beaks and unpredictable characters that have to be captured between <td></td> tags - everything I come up with will select 'across' <tr> and <table> tags. Can anyone help? Thanks in advance.
<table>You are trying to grab the bolded portions?
<tr><td>R1C1</td><td>Row1Column2</td><td>R1C3</td></tr>
<tr><td>R2C1</td><td>Row2Column2</td><td>R2C3</td></tr>
<tr><td>R3C1</td><td>Row3Column2</td><td>R3C3</td></tr>
</table>
<table>
<tr><td>R1C1</td><td>R1C2</td><td>R1C3</td><td>R1C4</td></tr>
<tr><td>R2C1</td><td>R2C2</td><td>R2C3</td><td>R2C4</td></tr>
<tr><td>R3C1</td><td>R3C2</td><td>R3C3</td><td>R3C4</td></tr>
</table>
Atually, it's simpler:
<table>
<tr><td>T1C1</td><td>Table1Column2</td><td>T1C3</td></tr>
</table>
<table>
<tr><td>T1C1</td><td>Table2Column2</td><td>T1C3</td></tr>
</table>
<table>
<tr><td>T1C1</td><td>Table3Column2</td><td>T1C3</td></tr>
</table>
...and the central column's content is varied and full of other 'tagged' input (text, images and comments) and carriage returns.
[search.cpan.org...]
or
[search.cpan.org...]
might be the ticket.
Or you can try and use a regexp but you may have no hair left at all before you figure out a reliable way to parse html using a regexp.
Extracting the data from the table could be a solution for sure - but I do have a few hairs left.
All this is to the goal of preparing the content for insertion into a new php skin - so I wanted the centre column to retain its formatting (img, style tags) but as a unique table - which eventually would be stripped as well. The problem for now is 'frame and isolate'. Already cleaning the 1995-era html was a chore : P
If someone does find this stickler through pure regex (without the above). Hats off - every hat I own!
Thanks for all your input - I'll be checking back : )