Forum Moderators: phranque
The expression <!-- ABT.* will match the <!-- ABT tag that I'm looking for in the file until the end of the line, but I need to match from there on until the end of the file.
Help!?
Too bad that took me HOURS to figure out!
6:05 am on April 9, 2009 (utc 0)...6:15 am on Apr 9, 2009 (utc 0)
? :-)
Did you try multiline/global modifiers?
\<\!\-\- - Match starts with this, I escape < , !, and - out of habit; in a character class these have specific meanings other than the character; escaping may not be necessary.
\s* - followed by zero or more spaces, typos are a pain
ABT - followed by ABT, if you want case-insensitive, add i modifier
.* followed by zero or more of ANY character
You might add $ at the end of the regexp, but this may cause it to stop matching at the end of the first line.
$match =~ /\<\!\-\-\s*ABT.*/gm;
or
$match =~ /\<\!\-\-\s*ABT.*/igm; # case-insensitive
That's perl, use preg_match() for PHP.
I actually need something now that'll find ALL HTML tags with the exception of line breaks ( <br> and </ br> ).
I really have done very little with regular expressions, so I can get it to sort of work, but my little bit of code will select everything on a line between < and > even if there's another > in the way. It doesn't stop at the first occurrence. I also don't know how to make it ignore the <br> tags.
Any suggestions there?
\<.*?\>
.* means zero or more of any character, which can sometimes snag the > even with the quantifier. A more standard approach is to begin with <, followed by zero or more /, which catches both closing, opening, or "empty" tags, followed by one or more of anything NOT a >, ending with >:
$match =~ /<\/*[^>]+>/g;
As to how to avoid br, I'm not sure at the moment, as a not ^ character class would also capture b and r alone, and you need to examine each instance of the match. You might have to split all the words into an array and step through them, with program logic to process it reliably:
foreach $word (@words) {
if ($word =~ /<\/*[^>]+>/) {
# I despise XHTML syntax when it's not necessary . . .
if ($word =~ /<br\s*\/*>/i) { $out .= " $word"; }
# No else required, this will skip anything but breaks
}
else { $out .= " $word"; }
}
Perl example given, easy to create in other languages. There may be a pure regexp approach, but I'm not . . . THAT good. :-)