Forum Moderators: coopster
I need some help. I'm working on a deadline and I need some help with regular expressions.
I've got a slew of html files with style tags and html formatting that I want to strip. Forunately, NoteTab handled the job well, but I need to turn some urls back into hrefs.
Here is what I the files now contain:
"Here is some sample text <http://www.internet.com/>www.internet.com
How would I structure the reg.exp. to turn that into:
<a href="http://www.internet.com/">www.internet.com</a>
AND here's one for you super-geniuses. If the target is NOT on my server, how would you make it look like this:
<a target="_blank" href="http://www.internet.com/">www.internet.com</a>
Of course, in a perfect world, I'd love to remove only the style and font tags, but I haven't found a tool that will help me do that.
$text = "Here is some sample text <http://www.internet.com/>www.internet.com some more text to test <http://www.test.com/>test";function convertHrefs($text, $debug = false)
{
$allowProtocol = array('http', 'https', 'ftp', 'wais', 'telnet');
$protocols = implode('¦', $allowProtocol);
$pattern = "/<((" . $protocols . "):\/\/([^>\/]+))\/?>([^>\s]+)/i";
if ($debug)
{
$test = preg_match_all($pattern, $text, $matches);
print_r($matches);
}
return preg_replace($pattern, "<a href=\"$2\" target=\"_blank\">$4</a>", $text);
}echo '<br />Replacement: ' . convertHrefs($text);
That will work assuming that the text after the < and > chars doesn't contain any spaces or > characters. I put a protocol array in there if you want to disallow protocols from being detected in the replacement (e.g. disallow 'telnet' protocol).
I probably wouldn't use this myself because of the syntax your using. I like the wiki or bbcode style stuff where you have an ending delimiter to define your descriptive name for the link.