Forum Moderators: coopster & phranque

Message Too Old, No Replies

Simple Regular Expression confounding me

:(

         

Kolyana19

9:47 pm on Aug 21, 2005 (gmt 0)

10+ Year Member



Okay, this RE should be simple, but it has sucked hours out of my life and I still can't nail it down.

:disco: :disco: disco:
:disco: :disco:

pattern = (^¦\s)+(:disco:)(\s¦$)+

I thought that the above pattern would match all instances of ":disco:" above, but it only matches EVERY OTHER one. If I apply the RE again, it'll then match all the other ones it missed the first time, so I've deduced that it's the 'space' ... once ":disco: " is matched, the trailign space is no longer available to the next ":disco:" so it doesn't match at all.

Thing is, I can't use {0,} (or the eqivalent), because in something like:
http://example.com/page.asp?param=:disco:
I *do not* want it to match ... :disco: must be flanked by whitespace of some description ...

So, a really simple RE, but what am I missing?

[edited by: jatar_k at 3:51 pm (utc) on Aug. 22, 2005]
[edit reason] examplified [/edit]

KevinADC

2:47 am on Aug 22, 2005 (gmt 0)

10+ Year Member



You only want to match :disco: if it has a space before and after? Or what? Your explanation is confusing. Can you post the actual regexp code you are working with, and are you using perl or ASP?

Kolyana19

3:22 am on Aug 22, 2005 (gmt 0)

10+ Year Member



Sorry, I'm working with asp/vbscript.

The code I'm using is:
oRE.Pattern = "(^¦\s)(:disco:)($¦\s)", where "disco" is just a test string, but accurate for the type of content I'm trying to match.

I'm performing a search/replace on a body of text such as this:

---
http://example.com/page.asp?param=:disco:&param=:disco:

:disco: :disco:
:disco:
flank :disco:right
flank:disco: left
flank:disco:both
---

There needs to be whitespace before and after for a succesful match, so the middle three :disco:'s would match, but the last three would not and nor would the one in the url.

I've tried using a zero assertion RE such as (?:^¦\s)(:disco:)(?:$¦\s), but when I use this as a basis for a replace statement, it also replaces the whitespace and ultimately I only want to replace the :disco: and leave the whitespace around it intact.

I hope that makes sense ... sorry if my first post was confusing.

~ Natalya

[edited by: jatar_k at 3:52 pm (utc) on Aug. 22, 2005]
[edit reason] turned off smilies [/edit]

KevinADC

5:55 am on Aug 22, 2005 (gmt 0)

10+ Year Member




There needs to be whitespace before and after for a succesful match

then maybe only one line will match (the first one):

:disco: :disco:
:disco:
flank :disco:right
flank:disco: left
flank:disco:both

but if there is no space after :disco: in the first line then none of them will match as none of them has a space before and after :disco:

I am not used to looking at ASP/vbscript code but the regexp is similar enough that maybe I can help:

(^¦\s)+ one or more spaces at the beginning of the string

(:disco:) just :disco:

(\s¦$)+ one or more space at the end of the string

written in perl (without parenthesis) this would be:

$string =~ /^\s+:disco:\s+$/

but since that is tied to the string anchores this will only match:

string = ' :disco: ';

maybe you shouldn't use the string anchors and just make it:

(¦\s)+(:disco:)(\s¦)+

and then use whatever operator there is in ASP to tell the regexp to match that pattern over the whole string, which would be the "g" operator in perl:

$string =~ /\s+:disco:\s+/g;

[edited by: jatar_k at 3:52 pm (utc) on Aug. 22, 2005]
[edit reason] turned off smilies [/edit]

KevinADC

6:00 am on Aug 22, 2005 (gmt 0)

10+ Year Member



I also suggest you find a forum that has a specific ASP or vbscript forum where you can ask your question. This one is a perl/cgi forum so ASP questions aren't really expected and might not get answered properly. tek-tips.com and devshed.com (as well as many other forums) have ASP forums.

My apologies to this forum if recommending other forums is against any written rules.

wruppert

1:54 pm on Aug 22, 2005 (gmt 0)

10+ Year Member



The following seems to work. It uses the lookbehind (?<=\s) and lookahead (?=\s) patterns to avoid eating up the whitespace.

-----------------

#!/usr/bin/perl
use warnings;
use strict;

my $string = ":disco: :disco::disco: :disco: :disco: :disco:";
$string =~ s/(^¦(?<=\s))(:disco:)($¦(?=\s))/hiphop/g;

print $string;

-----------------

Prints
hiphop :disco::disco: hiphop hiphop hiphop

As always, the broken vertical bar is supposed to be the unbroken vertical bar (pipe symbol).

Kolyana19

6:08 pm on Aug 22, 2005 (gmt 0)

10+ Year Member



Kevin, I posted it here because I think the perl/cgi 'world' tends to be a little more familiar with regular expressions, so even if the exact syntax doesn't match, someone may be able to point me in the right direction.

Wruppert ... I'm going to try that, thanks. I know that if I use that exact syntax vbscript barfs, but the look-ahead, look-behind thing is possible, so I'm going to see if I can get it to work! Thank you :)

KevinADC

7:19 pm on Aug 22, 2005 (gmt 0)

10+ Year Member



another way that seems a bit more readable to me:

my $string = ":disco: :disco::disco: :disco: :disco: :disco:";
$string =~ s/(\s+)(:disco:)(\s+)/$1hiphop$3/g;
print $string;

you could reduce the extra spaces to a single space (if desired and make it even simpler):

my $string = ":disco: :disco: :disco: :disco: :disco: :disco:";
$string =~ s/\s+:disco:\s+/ hiphop /g;
print $string;

I don't see the need to use the string anchors (^$) if you are going to check the entire string for all pattern matches, the "g" operator will make sure your regexp searches for the pattern from beginning to end.

KevinADC

7:26 pm on Aug 22, 2005 (gmt 0)

10+ Year Member



the extra spaces got lost in the post above

Kolyana19

9:33 pm on Aug 23, 2005 (gmt 0)

10+ Year Member



Sorry guys, I've been offline for a couple of days so I wasn't really able to check in onj this thread.

After some testing, it may seem that doing this in VBScript is going to be a non-starter.

To begin with, there doesn't seem to be a "lookbehind" feature, and even when using "non-capturing" syntax, the replace statement still replaces the 'non-captured' text ... in other words, it matches the whitespace and then promptly removes it. It's not meant to do this, right?

The 'extra' string (beginning and end, ^ $) are being used, because otherwise - again, in VBScript - the pattern will not match a :disco: at the start or end of the string.

Non-capturing defintively seems to be the way to go with this and even when asking on other forums (VB ones) this is the general consensus, but the support seems to be weak and patchy, so I'm going to experiment with a serverside Javascript function that may well do the trick. Maybe there's better support there.

Kolyana19

10:02 pm on Aug 23, 2005 (gmt 0)

10+ Year Member



After a little testing, the .replace functionality here is *also* replacing the 'non-captured' parts of the pattern ... so maybe I'm not understanding something here.

This strips the 'un-captured' whitespace ...
(?<= gives a syntax error, so I need to see if JS has an equivalent)
var output = sString.replace(/(?:^¦\s):disco:(?=$¦\s)/gi, 'chacha');

(Sorry, tried to edit, but the post was 'too old').

[edited by: jatar_k at 4:43 pm (utc) on Aug. 24, 2005]
[edit reason] disabled smilies [/edit]