regexp: match one part of a string but not another

Forum Moderators: open

Message Too Old, No Replies

regexp: match one part of a string but not another

SEOViking

11:34 am on Jan 20, 2006 (gmt 0)

I am trying to filter out some search results in eclipse with regular expressions, but aren't having any luck. Say I have these strings:
I do not like her.
I like to do this.
I like to do that.
He likes to do that.

I want to return all lines with "do" that don't contain "that" (without knowing the content of all the lines in all my files).

I have tried this, but it doesn't seem to work:
do.*[^that]

How do you specify to NOT match an entire string (like "that", for example) while still matching another string (like "do") in the same regexp?

Note i am not using perl or anything like that, just searching in eclipse so please don't give me any whacky syntax =) Thanks in advance.

SEOViking

7:14 am on Jan 21, 2006 (gmt 0)

so i'm guessing this is impossible then?

SEOViking

8:07 am on Jan 21, 2006 (gmt 0)

Not sure why this post was moved from Foo to javascript, but hope somebody here can help =)

SEOViking

8:14 am on Jan 21, 2006 (gmt 0)

I have come down to this and am out of ideas:

do .*[^t][^h][^a][^t]

this still doesn't work. Help!

DrDoc

7:06 pm on Jan 21, 2006 (gmt 0)

Yeah, I don't know why this ended up in the JavaScript forum either ... since you so far have not specified which language you will be using. :)

Anyway ... regardless of language, the easiest way of doing this is to simply check if it matches word A but not word B separately. So, something like:

if(insert_regexp_matching_function_name(/(^�[^a-z])do([^a-z]�$)/i) && !insert_regexp_matching_function_name(/(^�[^a-z])that([^a-z]�$)/i)) { 
  // we have a match 
}

[edited by: DrDoc at 7:25 pm (utc) on Jan. 21, 2006]

DrDoc

7:19 pm on Jan 21, 2006 (gmt 0)

Of course, you could also do it all in one regexp:

/(^�[^a-z])((that[^a-z](.*[^a-z])?)do�do([^a-z]that([^a-z].*)?))([^a-z]�$)/i

... but that's a little bit messy. (The above one matches if the string contains both "do" and "that", regardless of order.)

Since this is the JavaScript forum, I decided to put together a test example in JavaScript. Since you may use another language, the code is not important. But at least it demonstrates the functionality.

<script type="text/javascript"> 
sentences = new Array('I do not like her.', 
  'I like to do this.', 
  'I like to do that.', 
  'Do what you like.', 
  'That I do like.', 
  'He likes to do that.', 
  'Do.', 
  'Do that.'); 
   
for(i = 0; i < sentences.length; i++) { 
//  if(sentences[i].match(/(^¦[^a-z])do([^a-z]¦$)/i) && !sentences[i].match(/(^¦[^a-z])that([^a-z]¦$)/i)) { 
  if(!sentences[i].match(/(^¦[^a-z])((that[^a-z](.*[^a-z])?)do¦do([^a-z]that([^a-z].*)?))([^a-z]¦$)/i)) { 
    document.write("<b>" + sentences[i] + "</b><br>"); 
  } 
  else { 
    document.write("<i>" + sentences[i] + "</i><br>"); 
  } 
} 
</script>

Toggle between the two

[b]if[/b]

statements.

SEOViking

8:54 pm on Jan 21, 2006 (gmt 0)

I tried that script and it just writes out all 8 sentences.
I didn't specify a language because as i wrote in the beginning, i am just searching in eclipse (java IDE) with regex enabled, so i can't split the search into 2. And i wanted to match sentences that contain "do" that DON'T contain "that". Thanks for your response though.

Bernard Marx

9:58 pm on Jan 21, 2006 (gmt 0)

I tried that script and it just writes out all 8 sentences.

I believe it is meant to. The sentences that match are printed bold. Those that don't are italic.

Doc's regexp's worked perfectly for me.

so i can't split the search into 2

In that case, use the regexp in the demo that does it all at once, ie:

[blue]/^�[^a-z])((that[^a-z](.*[^a-z])?)do�do([^a-z]that([^a-z].*)?))([^a-z]�$)/i[/blue]

/* change corrupted ¦¦ chars for pipes */

SEOViking

10:07 pm on Jan 21, 2006 (gmt 0)

Ok i found this on crazygrrl dot com:
Pattern: do(ugh)?nut

doughnutmatches
donutmatches

So i am just wondering how to negate that string between the parentheses.

This didn't work:
do .*(^that)+

still looking for an answer..

SEOViking

10:10 pm on Jan 21, 2006 (gmt 0)

Bernard Marx, ooops i didn't look at the if/else part of it =)

SEOViking

10:48 pm on Jan 21, 2006 (gmt 0)

I guess i should have just used the actual examples i need. I'm looking for lines that contain "html:text" but that don't contain "maxlength". i tried to convert that regex to that without any luck (replacing "do" with "html:text" and "that" with "maxlength" but it didn't work)...
thanks so much for your guys' replies =)

DrDoc

10:49 pm on Jan 21, 2006 (gmt 0)

Negating in the regexp is always tricky. Especially if you can have any number of words inbetween (as per your examples) or if the words can appear in any order (as per my additional examples). It is always easier if you can create a regexp that matches what you don't want, and then negate the result.

I'd go for the second regexp I posted earlier ... provided you can negate the result, that is.

SEOViking

10:59 pm on Jan 21, 2006 (gmt 0)

thanks drdoc for your time, here is an example of how exactly the line formats i am parsing look (i know i should have just written this from the beginning, shame on me):
<html:text name="user" property="id" size="50" />
<html:text name="user" property="id" size="50" maxlength="100" />

so i want to find all of the first kind (but i don't know about the exact number of attributes in all such tags...)

it is so strange, because this concept is conceptually so simple:

does the line contain X?
if yes
does it not contain Y?
if yes
add to list of matches.

the people who maintain regex should take a look at this problem and implement a simple solution for it, so that something like this could be run:
html:text.*!(maxlength)+

instead of some massively long sentence of tricky syntax, but hey, that's just my opinion =)

Bernard Marx

11:52 pm on Jan 21, 2006 (gmt 0)

Yes. How's about.

onload = function() 
{ 
  var wonder = new WebPageWonderfulness; 
  wonder.doStuffWhileISitBackAndHaveANiceCupOfTea(); 
}

Anyway. Having trouble with this. It must be a common one, and I have a sneaky feeling I solved it once, sometime, somewhere.

Quite likely to have something to do with negative lookahead.

[blue](?!maxlength)[/blue]

DrDoc

4:24 am on Jan 22, 2006 (gmt 0)

In that case ... using negative lookahead:

/(^�[^a-z])do(?![^a-z](.*[^a-z])?that[^a-z])([^a-z]�$)/i

Please note that the regexp is somewhat lengthy, simply because it ensures that each word (in this case "do" and "that" is surrounded by non-alpha characters).

DrDoc

4:27 am on Jan 22, 2006 (gmt 0)

<script type="text/javascript"> 
sentences = new Array('I do not like her.', 
  'I like to do this.', 
  'I like to do that.', 
  'Do what you like.', 
  'That I do like.', 
  'He likes to do that.', 
  'Do.', 
  'Do that.', 
  'Do this and that all the time.'); 
   
for(i = 0; i < sentences.length; i++) { 
  if(sentences[i].match(/(^¦[^a-z])do(?![^a-z](.*[^a-z])?that[^a-z])([^a-z]¦$)/i)) { 
    document.write("<b>" + sentences[i] + "</b><br>"); 
  } 
  else { 
    document.write("<i>" + sentences[i] + "</i><br>"); 
  } 
} 
</script>

Bold = should match
Italic = should not match

Compare with output.

SEOViking

1:41 pm on Feb 1, 2006 (gmt 0)

I finally got the answer i needed from uncle_alice on sun's forum:
<html:text\b([^>"m]++¦"[^"]*+"¦m(?!axlength))*+>

matches all jsp html:text tags that don't contain "maxlength". thanks for all your guys' help though =)