Forum Moderators: coopster

Message Too Old, No Replies

Preg Replace

         

Gian04

4:55 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



I have a site that accepts user comment that allows them to submit even an HTML link

Now I just want to programmatically change their post if they submitted a link

example:
if they submitted blah blah http://www.example.com&a=123 blah blah

I will make it [mydomain.com...]

Of course I know that I can do it this way:
preg_replace('/http:\/\/([a-z0-9\.\/,_-]+)/i', '<a href="http://www.mydomain.com/redirect.php?u=$external_url">click here</a>', $post_body);

But I dont know how can I get the whole external url so that I can pass it the the variable $external_url

Please help. Thanks

TheMadScientist

5:11 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure if I exactly understand the question, but if you want to reference what's stored between the () you simply count the number of ( from the left and use the corresponding reference number in the replacement string, which in this case would be 1.

preg_replace('/http:\/\/([a-z0-9\.\/,_-]+)/i', '<a href="http://www.example.com/redirect.php?u=$1">$1</a>', $post_body);

If your question is something different and I'm missing it, please be more specific, because I think that's what you were asking for.

Gian04

5:33 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



OK Ill revise my question since my preg_replace code didnt work.

How can I strip a link URL from a comment post and replace it with something like [mydomain.com...]

or

[mydomain.com...]

TheMadScientist

5:58 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well what you have is really close I think... I didn't look too closely at the regular expression, but in looking again, it seems some characters you would like to match are missing:

Try this:
(It works for me.)

$post="Some text http://www.testing.com?test=test&moreTest=test some more text";

$post=preg_replace('/http:\/\/([a-z0-9\.\/,&\?=_-]+)/i', '<a href="http://www.example.com/redirect.php?u=$1">$1</a>', $post);
echo $post;

Of course this assumes URLs will always be preceded with http:// and if this is not the case, the expression will need to be adjusted.

andrewsmd

5:59 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know I have answered a similar question before but I cannot seem to find it. Can one of the moderators?

andrewsmd

6:15 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This will give you an array of all of the links, you then just need to go through the string and replace them
$str = "http://example.com blah blah blah http://www.example.com/someplace?id=blah blah blah blah blah http://www.example.com/tol/arts_and_entertainment/books/article6866020.ece
blah blah blah http://www.example.com";

//this will keep track of where we are in the file
$spot = 0;

//just in case there is a url as the last element we will
//add a blank space
$str .= " ";

//an array of all of the urls we will need to replace
$returnArr = array();

//fill an array of the http:// you could use a regex
//i'm just not familiar with them and I am with the string functions
//find all of the http
while(strpos($str, "http://", $spot) !== false){

//where we are in the string
$place = strpos($str, "http://", $spot);

//find the url
$tempUrl = substr($str, $place, strpos($str, " ", $place) - $place);

array_push($returnArr, $tempUrl);

$spot = strpos($str, "http://", $spot) + 1;

}//while

//make sure we have unique urls to only replace them once.
array_unique($returnArr);

Gian04

6:15 pm on Oct 8, 2009 (gmt 0)

10+ Year Member



TheMadScientist,

It didnt work,

1. It stops displaying image on my page with the same src but no <a> tag, (It should not affect text, image without any <a> tag)
2. it change the anchor text to the link URL (anchor text shold remain the same)
3. It changes the link URl to [testing.com...] href=

TheMadScientist

6:30 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The pattern just needs to be adjusted so it matches user submitted data, but not your current data... Quite a bit of it depends on where it will be run, and exactly what should be matched or excluded, because matching user submitted data and replacing the submitted URL when it is submitted is a completely different situation than running the code on an existing web page, so to answer the question any better than I have I need to know the specifics of where it will be run and what exactly it should replace, as well as what should not be replaced because regular expressions are very powerful tools, and they have to be exact to your specific situation.

I honestly think you might find it easier to use andrewsmd's method, if you are not familiar with regular expressions, because it is something you are more likely to be able to work with and adapt to your needs, but it will probably also need to be adjusted if you are running it on a full web page, because I think you will run into exactly the same issues as you did with my solution.

andrewsmd

6:45 pm on Oct 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also you will need to add another while loop for https:// I forgot to do that.

rocknbil

1:30 am on Oct 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The largest problem, I think, is if your URL has the query string question mark

example.com/somscript?key=val

when you tack that on to your redirect url, you now have two query string delimiters and any key/values in the remote query string are likely to wreak havoc in redirect.php.

There are some other issues you encounter with user input, many of them may not even know what [protocol...] is. Mine doesn't address https either, and I couldn't figure out the #$%^$ period right at the end - run it and see what I mean. :-)

The goal was to experiement with a few "bad URL" conditions and encode the remote URL's troublesome characters.


<?php
header("Content-type:text/html");
// So it doesn't triple-wide the code on this board,
// a few predefined goodies:
$url_pattern = 'A-Z\d\-\_\.\%\/';
$http_escaped = 'http:\/\/';
$http = 'http://';
// Be sure to use the real pipe character here, not ¦
$tlds = 'com¦net¦org¦gov¦us¦tv';
$my_redirect = 'myredirect.php';
$output = NULL;
// A sample block of text.
$input = 'Hello, my name is Bill. I\'m here
to promote example.com. The URL is http://example.com/somescript.cgi?id=123&ref=Bill
and if you tell them "Bill sencha" I get a penny. Since I
have **sooooo** many friends this will make me an instant
millionaire. So go to http://www.example.com/somscript.php?id=123&ref=Bill as soon
as possible!
';
echo "<p>Original input:</p><p>$input</p>";
// Bleah. Make it a single line.
$input = preg_replace('/[\n\r]+/',' ',$input);
//
$words = explode(' ',$input);
//
foreach ($words as $word) {
// Couldn't figure out how to sub the period right after the tld, RATS!
$isPeriod = 0;
if (preg_match("/\.$tlds/i",$word)) {
// If you don't encode ?, &, and =, it will get munged up in myredirect.php.
// I'd use a predefined function here, but we want to ONLY encode the ones in the URL.
$untouched = $word;
$word = preg_replace('/\?/',"%3F",$word);
$word = preg_replace('/\&/',"%26",$word);
$word = preg_replace('/\=/',"%3D",$word);
if (preg_match('/.+\.$/',$word)) {
$isPeriod = 1;
$word = preg_replace('/(.+)\.$/i',"$1",$word);
}
// may or may not have http,www....
$word = preg_replace("/($http_escaped)*(www\.)*([$url_pattern]+)/i","<a href=\"$my_redirect?u=$http$2$3\">$untouched</a>",$word);
}
$output .= "$word ";
}
echo "<p>Output:</p><p>$output</p>";
echo "<p>As entites so you can see the code</p>";
echo htmlentities($output);
?>