Forum Moderators: coopster
<?php
$string1 = "<a href='http://www.mydomain.tld/randompage.html'>widget page</a>";
$string2 = '<a href="http://www.mydomain.tld/randompage.html">widget page</a>';
$string3 = '<a href=http://www.mydomain.tld/randompage.html>widget page</a>';
$string4 = '<a
href="http://www.mydomain.tld/randompage.html">widget
page</a>';
echo "1 $string1 <br>\n";
echo "2 $string2 <br>\n";
echo "3 $string3 <br>\n";
echo "4 with carriage returns $string4 <br>\n";
// Couldn't figure out why newlines didn't work - so first strip them. A little homework for you. :-)
$string4 = preg_replace('/[\n\r]+/'," ",$string4);
// Note that the [] character classes contain a single quote ' next to a double quote ""
// This may not be obvious in this forum's display
$string1 = preg_replace('/<.*?href\s*=\s*[\'"]*([^\'">]+)[\'"]*>.*<\/a>/i',"$1",$string1);
$string2 = preg_replace('/<.*?href\s*=\s*[\'"]*([^\'">]+)[\'"]*>.*<\/a>/i',"$1",$string2);
$string3 = preg_replace('/<.*?href\s*=\s*[\'"]*([^\'">]+)[\'"]*>.*<\/a>/i',"$1",$string3);
$string4 = preg_replace('/<.*?href\s*=\s*[\'"]*([^\'">]+)[\'"]*>.*<\/a>/i',"$1",$string4);
echo "1 $string1 <br>\n";
echo "2 $string2 <br>\n";
echo "3 $string3 <br>\n";
echo "4 $string4 <br>\n";
?>
breakdown:
/ - pattern delimiter
< - start with carat
.*? - followed by zero or more of any character with the quantifier to prevent it from slurping up the whole string. This could include other attributes: class="myclass" title="mytitle"....
href - followed by href
\s*=\s* followed by zero or more spaces, equal sign, and zero or more spaces. This will account for sloppy coding; spaces are not recommended but supported
[\'"]* - followed by zero or more of ' or ". Zero or more covers unquoted hrefs. I'm escaping ' because that is my preg_replace delimiter; escape " if you use that as the delimiter.
([^\'">]+) - followed by one or more (+) of anything NOT a ', ", or >, which should be the URL. Surrounded in parenthesizes, gets stored in $1
[\'"]* - followed by zero or more of ' or ", again, zero or more covers unquoted attributes
> followed by closing carat
.* - followed by zero or more of any character (sloppy, sorta, but will work in conjunction with . . .
<\/a> - followed by the closing anchor tag, note this will fail for malformed HTML without a closing tag but that should be fairly obvious
/i' - end pattern match, i=case insensitive for a HrEf and </A>
It should have worked for string4 too, don't know why it didn't, which is why I added the carriage return strip line. I'm sure a regexp expert will point it out. :-)