Forum Moderators: coopster

Message Too Old, No Replies

RegExp to modify a href

Inserting a word after the link...

         

dari

8:51 pm on Mar 8, 2005 (gmt 0)

10+ Year Member



What I have is a paragraph of text containing multiple "<a href=..." tags. I want to do something like this:

Before change:
Hello, my name is foobar. I like to use <a href="http://www.google.com">this search engine</a> more than any other.

After change:
Hello, my name is foobar. I like to use <a href="http://www.google.com">this search engine</a> [google.com] more than any other.

My regexp so far that isn't working:

$types = array("href", "url");
while(list(,$type) = each($types)) {
preg_match_all ("¦$type\=\"?'?`?([[:alnum:]:?=&@/#._-]+)\"?'?`?¦i", $text, &$matches);
$ret[$type] = $matches[1];
}
$href = $ret['href'];
$url = $ret['url'];
foreach($href as $link) {
$link_array = parse_url($link);
$host = $link_array['host'];
$hostarray = explode(".",$host);
if(count($hostarray) > 2) {
array_shift($hostarray);
}
$domain = implode(".",$hostarray);
$search = '/<a href="$link" target="_blank" class="postlink">([[:alnum:]:?=&@\/#._-]+)<\/a>/';
$replace = '<a href="$link" target="_blank" class="postlink">\$1</a> [$domain]';
$text2 = preg_replace($search,$replace,$text2);
}

The first part (up to the "foreach") simply goes through the text and pulls out all the linked URLs. After that, I cycle through the array of URLs, trying to do the replacement as needed.

Essentially, I'm pulling the domain from the link and inserting into some square brackets after the link. The linked text is not necessarily the URL of the link itself.

I've been working at this for hours, but haven't had any success...Any suggestions?

ironik

9:27 pm on Mar 8, 2005 (gmt 0)

10+ Year Member



Looks like it could be simplified somewhat. let the regex engine do the work for you:

<?php
$subject = '<a href="http://www.google.com">this search engine</a>';
$pattern = "/<a href=\"(https?¦ftp¦gopher¦file¦wais):\/\/((www)?([A-z0-9_-]+\.)+([A-z]{2,6}))\"([\w\s]*)>([^<]*)<\/a>/i";
echo preg_replace($pattern, "$0[$4$5]", $subject);
?>

That will output:

<a href="http://www.google.com">this search engine</a>[google.com]

Just incorporate that into your loops.

dari

1:40 pm on Mar 9, 2005 (gmt 0)

10+ Year Member



I think I've got you...the one thing I forgot to mention is that the href has attributes:

<a href="http://www.google.com" target="_blank" class="link">this search engine</a>

I've tried modifying $pattern to include whitespace, alphas, double quotes and underscores before the [\w\s]* but can't get it to take....please help :)

dari

3:44 pm on Mar 9, 2005 (gmt 0)

10+ Year Member



I've got this now:
preg_match_all("'<a[^>]*>.*?</a>'si", $text, $matches);
$links = $matches[0];
$pattern = "/<a href=\"(https?¦ftp):\/\/((www)?([A-z0-9_-]+\.)+([A-z]{2,6}))\"([[:print:]]*)>([^<]*)<\/a>/i";
foreach($links as $link) {
echo preg_replace($pattern, "$0 [$4$5]", $link)."<br>";
}

which matches the following:
www.foo.com
foo.com

but does not match:
www.foo.com.tw/foo/bar
www.foo.com/test.php?foo=bar

IE, it does not match URLs with information after the .<tld>

ironik

9:13 pm on Mar 9, 2005 (gmt 0)

10+ Year Member



This is untested, but you have to open up the regex pattern to include the forward slash character:

$pattern = "/<a href=\"(https?¦ftp):\/\/((www)?([A-z0-9_-]+\.)+([A-z]{2,6})(\/?[\w\W])*\/?)\"([[:print:]]*)>([^<]*)<\/a>/i";