Regular Expression - Select from match to end of file - Webmaster General forum at WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

Regular Expression - Select from match to end of file

Need a little help with a simple regular expression

Sootah

6:05 am on Apr 9, 2009 (gmt 0)

I am using Dreamweaver to clean up some files, and I need to make a regular expression that will select from a bit of text that's matched until the end of the file.

The expression <!-- ABT.* will match the <!-- ABT tag that I'm looking for in the file until the end of the line, but I need to match from there on until the end of the file.

Help!?

Sootah

6:15 am on Apr 9, 2009 (gmt 0)

Nevermind. Looks like <!-- ABT(\s¦.)* will do it. Too bad that took me HOURS to figure out!

rocknbil

4:52 pm on Apr 9, 2009 (gmt 0)

Too bad that took me HOURS to figure out!

6:05 am on April 9, 2009 (utc 0)...
6:15 am on Apr 9, 2009 (utc 0)

? :-)

Did you try multiline/global modifiers?

\<\!\-\- - Match starts with this, I escape < , !, and - out of habit; in a character class these have specific meanings other than the character; escaping may not be necessary.

\s* - followed by zero or more spaces, typos are a pain

ABT - followed by ABT, if you want case-insensitive, add i modifier

.* followed by zero or more of ANY character

You might add $ at the end of the regexp, but this may cause it to stop matching at the end of the first line.

$match =~ /\<\!\-\-\s*ABT.*/gm;

$match =~ /\<\!\-\-\s*ABT.*/igm; # case-insensitive

That's perl, use preg_match() for PHP.

Sootah

6:36 pm on Apr 9, 2009 (gmt 0)

Ooh, I like.

I actually need something now that'll find ALL HTML tags with the exception of line breaks ( and ).

I really have done very little with regular expressions, so I can get it to sort of work, but my little bit of code will select everything on a line between < and > even if there's another > in the way. It doesn't stop at the first occurrence. I also don't know how to make it ignore the tags.

Any suggestions there?

Sootah

7:53 pm on Apr 9, 2009 (gmt 0)

\<.*?\> will nicely find anything located within <>, but now I need to figure out how to make it exclude and .

rocknbil

9:27 pm on Apr 9, 2009 (gmt 0)

You're close - however

\<.*?\>

.* means zero or more of any character, which can sometimes snag the > even with the quantifier. A more standard approach is to begin with <, followed by zero or more /, which catches both closing, opening, or "empty" tags, followed by one or more of anything NOT a >, ending with >:

$match =~ /<\/*[^>]+>/g;

As to how to avoid br, I'm not sure at the moment, as a not ^ character class would also capture b and r alone, and you need to examine each instance of the match. You might have to split all the words into an array and step through them, with program logic to process it reliably:


foreach $word (@words) { 
 if ($word =~ /<\/*[^>]+>/) { 
  # I despise XHTML syntax when it's not necessary . . .  
  if ($word =~ /<br\s*\/*>/i) { $out .= " $word"; } 
  # No else required, this will skip anything but breaks 
 } 
 else { $out .= " $word"; } 
}

Perl example given, easy to create in other languages. There may be a pure regexp approach, but I'm not . . . THAT good. :-)