homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Regular Expression - Select from match to end of file
Need a little help with a simple regular expression
Sootah




msg:3888742
 6:05 am on Apr 9, 2009 (gmt 0)

I am using Dreamweaver to clean up some files, and I need to make a regular expression that will select from a bit of text that's matched until the end of the file.

The expression <!-- ABT.* will match the <!-- ABT tag that I'm looking for in the file until the end of the line, but I need to match from there on until the end of the file.

Help!?

 

Sootah




msg:3888744
 6:15 am on Apr 9, 2009 (gmt 0)

Nevermind. Looks like <!-- ABT(\s¦.)* will do it. Too bad that took me HOURS to figure out!

rocknbil




msg:3889096
 4:52 pm on Apr 9, 2009 (gmt 0)

Too bad that took me HOURS to figure out!

6:05 am on April 9, 2009 (utc 0)...

6:15 am on Apr 9, 2009 (utc 0)

? :-)

Did you try multiline/global modifiers?

\<\!\-\- - Match starts with this, I escape < , !, and - out of habit; in a character class these have specific meanings other than the character; escaping may not be necessary.

\s* - followed by zero or more spaces, typos are a pain

ABT - followed by ABT, if you want case-insensitive, add i modifier

.* followed by zero or more of ANY character

You might add $ at the end of the regexp, but this may cause it to stop matching at the end of the first line.

$match =~ /\<\!\-\-\s*ABT.*/gm;

or

$match =~ /\<\!\-\-\s*ABT.*/igm; # case-insensitive

That's perl, use preg_match() for PHP.

Sootah




msg:3889203
 6:36 pm on Apr 9, 2009 (gmt 0)

Ooh, I like.

I actually need something now that'll find ALL HTML tags with the exception of line breaks ( <br> and </ br> ).

I really have done very little with regular expressions, so I can get it to sort of work, but my little bit of code will select everything on a line between < and > even if there's another > in the way. It doesn't stop at the first occurrence. I also don't know how to make it ignore the <br> tags.

Any suggestions there?

Sootah




msg:3889299
 7:53 pm on Apr 9, 2009 (gmt 0)

\<.*?\> will nicely find anything located within <>, but now I need to figure out how to make it exclude <br> and <br />.

rocknbil




msg:3889380
 9:27 pm on Apr 9, 2009 (gmt 0)

You're close - however

\<.*?\>

.* means zero or more of any character, which can sometimes snag the > even with the quantifier. A more standard approach is to begin with <, followed by zero or more /, which catches both closing, opening, or "empty" tags, followed by one or more of anything NOT a >, ending with >:

$match =~ /<\/*[^>]+>/g;

As to how to avoid br, I'm not sure at the moment, as a not ^ character class would also capture b and r alone, and you need to examine each instance of the match. You might have to split all the words into an array and step through them, with program logic to process it reliably:


foreach $word (@words) {
if ($word =~ /<\/*[^>]+>/) {
# I despise XHTML syntax when it's not necessary . . .
if ($word =~ /<br\s*\/*>/i) { $out .= " $word"; }
# No else required, this will skip anything but breaks
}
else { $out .= " $word"; }
}

Perl example given, easy to create in other languages. There may be a pure regexp approach, but I'm not . . . THAT good. :-)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved