Forum Moderators: coopster & phranque

Message Too Old, No Replies

RegExp Guru's

multiple values in one RegExp

         

Garoun

8:55 pm on Nov 12, 2003 (gmt 0)

10+ Year Member



I posted about this in Javascript but since I know most of you Regular Expression experts are around in the Perl forum perhaps you can help me figure out the required regular expression before I go crazy.

Sample Data:


<P>Active Component Answer
<P>
<P>One Answer with 3 bullets.</P>
<UL><LI dir=ltr>Bullet 1 - Text
<UL>
<LI dir=ltr>Bullet&nbsp;2 - Text
<UL>
<LI dir=ltr>Bullet 3 - Text</LI></UL></LI></UL></LI></UL>
<P>No Bullet Text to end this Answer</P>

I've been using: (formatted in javascript)
$someString =~ s/<.*?>//gis ;
$someString =~ s/<[^>]*>//gis ;

However, now I need to SAVE some data to use in a substitution and remove the rest. Please post any RegExp you know of that would end with the final output below.

RegExp Sample Data:


<P>Active Component Answer
<P>
<P>One Answer with 3 bullets.</P>
<UL><LI>Bullet 1 - Text
<UL>
<LI>Bullet&nbsp;2 - Text
<UL>
<LI>Bullet 3 - Text</LI></UL></LI></UL></LI></UL>
<P>No Bullet Text to end this Answer</P>

P.S
Primarily what I plan to do is read each 'tag' <sometext> and if it has space store all text after the < but before the space, and remove anything from the space to the >:
<LI dir=ltr> -> $1 = LI -> <LI>

BUT, using the same RegExp to do the above, if the tag is already properly formatted without a space output it how it is:
<UL> -> $1 = UL -> <UL>
</UL> -> $1 = /UL -> </UL>

Thanks for any aid, I'm slowly learning RegExp but man are they confusing sometimes.
6hrs and still going... think its lunch time before I die.

DrDoc

9:52 pm on Nov 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How about...

$someString =~ s/(</?[a-z]+)( [^.]*)(>)/$1$2/gis ;

coopster

10:45 pm on Nov 12, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I think you may have to escape that first slash, and I think you want the third replacement:
$someString =~ s/(<\/?[a-z]+)( [^.]*)(>)/$1$3/gis ;

killroy

10:52 pm on Nov 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't ask me for the js syntax but basically you want to match
(<[^ >]+)[^>]*>
and replace it with
$1>

or alternatively:

(?<=<[^ >]+)([^>]*)(?=>)
and simply delete.

This is in my own scripting lang, but I think it'S perl compatible. Dunno if JS supports assertions.

SN

DrDoc

11:18 pm on Nov 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey, thanks for catching that, coopster ;)
You are, of course, absolutely right.

Garoun

12:55 am on Nov 13, 2003 (gmt 0)

10+ Year Member



You all never let me down, especially when it comes to RegExp's :)

after spending all day working on it I finally called it and decided it was time to go home. I'll be sure to try out these suggestions first thing.

I'll be sure to let you know how it goes.

Thanks

Garoun

7:15 pm on Nov 13, 2003 (gmt 0)

10+ Year Member



I'll say again... all of the folks here help those of us that have already lost our minds due to excessive coding from being committed :)

final expression that did the trick was

string.replace(/(<[^ >\s:]+)[^>]*>/gim,"$1>")

so guessing in perl it would be
s/(<[^ >\s:]+)[^>]*>/$1>/gim ;

Ended up using killroy's initial logic but had to add the \s.* so the stored val would not include the whitespace or any text following it.

Thanks again, I will get these RegExp down yet especially with this editor I've been working on.