Forum Moderators: open

Message Too Old, No Replies

Regular Expressions, Too damn Greedy

modifyers in rexexp.

         

hollow

4:17 pm on Jun 25, 2004 (gmt 0)

10+ Year Member



This is getting to me, I need to go through a big html table and find all of the table cells which contain a keyword, so what i've got so far is this:

var checkCells=new RegExp("(<TD.*?keyword.*?/TD>)","gi");

Two problems.

Firstly, the dot doesn't pick up on linebreaks, and the s modifier doesn't seem to be supported. For some reason, this doesn't work:

var checkCells= new regExp("(<TD[\r\n.]*?keyword[\r\n.]*?/TD>)","gi");

Secondly the regular expression is too greedy, so if it encounters a row with six cells and the keyword is in the fourth, it returns the first four. So it only seems to be being greedy on the left hand side, not the right.

Any ideas folks?

[edited by: hollow at 4:19 pm (utc) on June 25, 2004]

[edited by: DrDoc at 4:23 pm (utc) on June 25, 2004]

stevenmusumeche

4:18 pm on Jun 25, 2004 (gmt 0)

10+ Year Member



I believe there is a slightly different syntax for multi-line regular expressions.

DrDoc

4:26 pm on Jun 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let me take a stab at this...
First off, are you using the innerHTML of a table row, or where do you get the data string from?

Then, I would modify the regular expression to something like this, which is probably closer to what you want:

var checkCells=new RegExp("(<TD[^>]*>[^<]*keyword[^<]*</TD>)","gi");

Bernard Marx

4:31 pm on Jun 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I know about RE you could fit in a matchbox, and still have room for all the matches.
Have you considered using a DOM approach to this (as a backup)?

Possibly less efficient - I don't know, but certainly easier to solve your worries.

hollow

4:34 pm on Jun 25, 2004 (gmt 0)

10+ Year Member



Sorry, should have mentioned, the [^<] plan won't work as the cells can also contain html, paragraph tags and the like.

What I would ideally need is something more like this,

var checkCells=new RegExp("(<TD>(^<TD>)*keyword(^</TD>)*</TD>)","gi");

but of course this won't work either because the brackets don't work like that...

note: thanks for the <TD[^>]*> point, i've removed it from this example to keep things simple for the time being, but will add it back in when I get the rest working.

[edited by: DrDoc at 9:34 pm (utc) on June 25, 2004]
[edit reason] disabling smilies [/edit]

hollow

4:39 pm on Jun 25, 2004 (gmt 0)

10+ Year Member



Bernard Marx, thanks you could be right there... There was a reason why I wasn't using the DOM approach, but I can't seem to remember what it was now...

getElementsByTagName("td");
then loop through them checking for the keyword seems like it could be a good plan...