Forum Moderators: coopster
I'm usually able to find things out using a regex reference guide online but I think I'm overlooking something here, it's probably simple but I can't figuring it out for the past couple of hours.
I have the following regex which works great:
preg_match_all("/href=\"?(.+?)[\" >]/i",$anchortitle_matches_str,$extintlink_matches);
It gives me all links on a site wether they are in <a href="bla.html"> or <a href=bla.html> or <a href=bla.html target=_blank> format.
So that's great.
Anyway, I'm trying to ignore ftp://, mailto:, javascript: etc.
I can do this by simply looping through my result array and ignore any results found using strpos or so, however, I know it has to be possible with regex.
Basically I'm trying to do this:
href=\"?mailto://¦ftp://¦javascript:(.+?)[\" >]
But the opposite. The above ONLY gives me links that DO contain ftp mailto and javascript but I'm trying to ignore the above.
I can't figure out how to properly use ^ or otherwise negate my unwanted links using the above method. Same goes for character classes, whenever I use [ ] regex just ignores them letter by letter so:
href=\"?[^mailto://](.+?)[\" >]
This simply ignores ANY link with an m a i l t o : or a / in it, that's not what I want, i want it to ignore links with mailto:// in it (hence the subject of thing post, ignore STRING!)
I hope that made sense.
Thanks in advance for any of those who can point me in the right direction!
Also I'm aware I could just look for links based on http:// or https:// but the problem is I'm also trying to find internal links so that's no solution :)
[edited by: eelixduppy at 1:29 pm (utc) on Jan. 14, 2008]
[edit reason] disabled smileys [/edit]
To get rid of ftp is not something I managed.
I tried lookbehind assertions but they didnt seem to be working :(
However maybe someone else will get an idea from my NOT WORKING code -
$pattern = '%href=?(?<!ftp\.)([\w\./#&]+)[" ]>%';
preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER);
echo '<pre>';
print_r($matches);
echo '</pre>';
<?
$str .= "<a href=\"somewhere.php\">";
$str .= "<a href=\"somewhere.php\" target=\"_new\">";
$str .= "<a href=\"mailto://someone@somewhere.com\">";
$str .= "<a href=\"ftp://somewhere.com\">";
$str .= "<a href=\"somewhere.php\" target=\"_blank\">";
$pattern2="/\<{1}a\shref=\"{0,1}[a-zA-Z0-9\%\_\-\?\.\s\&\=\"\/]+\"{0,1}\>{1}/si";
preg_match_all($pattern2, $str, $matches);
print_r($matches); //you'll need to "view source" to see the actual becuase html entities will get parsed.
?>
if this doesn't work for you, im sure theres other ways. suhc as writing a loop to go through your $matches array, store the ones that are valid in a temp array while in the process ignore all the ones with the string you want to ignore.