Forum Moderators: coopster
function find_in_tags ($tag, $needle, $haystack) {
if (preg_match( "/<$tag>(.*$needle.*)<\/$tag>/si", $haystack, $needle )) {echo "found";
} else {echo "Not found";} The problem I've encountered is that some tags will have additional info in them, such as <body bgcolor="000000"> etc.
I think I can get around this by mathing everything but the > symbol, with something like
preg_match( "/<$tag(.*^>)>
I changed [^>]*>( to [^>].*? in order to try and match <p>keyword</p> and also <p class='someclass'>keyword</p>. But this change doesn't seem to help.
function find_in_tags ($tag, $needle, $haystack) {
if (preg_match( '/<$tag[^>].*?>(.*$needle.*)<\/$tag>/si', $haystack, $needle )) {echo "found";
} else {echo "not found";}
}
find_in_tags ("p", "this", "<p class='someclass'>this</p>");
Grrr @ regex ;)
function find_in_tags ($tag, $needle, $haystack) {
if (preg_match( '/<'.$tag.'[^>]*>(.*'.$needle.'.*)<\/'.$tag.'>/si', $haystack, $needle )) {echo "found";} else {echo "not found";}
}
Many thanks for the help guys!
As is, the function will partial match words, which I don't want. I've tried /b for word endings and also not matching letters or number, but neither seems to work properly.
Can anyone help me with the syntax for matching 0-N characters that are not letters or numbers (or closing tag symbols). [^a-zA-Z0-9].* doesn't seem to do it.
i.e to match <$tag>$needle</$tag> but also <$tag>bleh $needle,</$tag> etc.
[^a-zA-Z0-9].* = is not a-zA-Z0-9 followed 'any character that is not the end of a line' 0 or more times.
The problem is the misuse of the .(dot) meta character with a 0 to N modifier...
Modifiers work on the immediately preceding character or group of characters, so the correct syntax you asked for would be:
[^a-zA-Z0-9]*
If you continue to have issues, please post some real examples of what you are matching and trying to accomplish -- some of us are *very* visual and if we cannot see the pattern it is tough for us to help you.
EG from above:
<$tag>bleh $needle,</$tag>
to me 'bleh' looks like letters or numbers, so I am not sure why you are asking for the expression you are, and I cannot offer any real advice on efficiency, or what might be missing.
Justin
I want to match the occurance of a particular word (or number of words) within identified html tags.
So if $needle is 'chicken soup' (without the quotes) and the $tag is p (<p>), I want match to be true for <p>chicken soup</p> and <p>i like chicken soup for tea</p>
The script above achieves this, however, it will also be true for <p> notchicken soup</p> and<p>notchicken souple</p>. I figured the easiest way to get around this was to check for the existence of a character that was not a letter or number before and after the end of the string. But I also need to continue matching <p>chicken soup</p>. So basically, if there's a character at the beginning or end of the string, as long as that's not a letter or number that's OK. Although perhaps my approach is flawed ;)
$tag="p";
$needle="Document";
preg_match("/<".$tag."[^>]*>[^\b]*(\b".$needle."\b)[^<]*<\/".$tag.">/im", $haystack, $result);
if($result[1]) {
echo "Success! ".$result[1];
}
else {
echo "No Match";
}
I'll let you play with the efficiency -- to remove the multi-line function, delete the m after the right slash.
Justin
I hate to be posting here again, but while this new code does exactly what I asked for (!), it's thrown up a separate problem - keywords in nested tags can no longer be found, so while <p class="test">Untitled Document</p> finds 'untitled document', <p class="test"><span>Untitled Document</span></p> or <p class="test"><span>Untitled</span> Document</p> does not.
I've tried some different approaches and none seem to get the desired result. Any help?
(If you're a drinking man, a pint is yours!)
$tag="p";
$close=preg_match("/^([^\b]+)/", $tag, $closing_tag);
$needle="Document";
preg_match("/<".$tag."[^>]*>[^\b]*(\b".$needle."\b)[^<]*<\/".$closing_tag[1].">/im", $haystack, $result);
Hope this helps.
Justin