Forum Moderators: coopster

Message Too Old, No Replies

PHP Curl - how to get ONLY the final redirect URL?

To catch spammers' destination URL

         

craig1972

1:10 am on Aug 11, 2010 (gmt 0)

10+ Year Member



Hi.

A section on my website has a feature where people can submit some info, part of which is their URL.

I'm now discovering that some cretins give us URLs that are actually forwarded to other URLs which are in turn forwarded to other URLs.

So I'm writing a script that tells me the *final* destination URL.

I think I can do this with CURL, right? I try this code, but this only outputs the site HTML:

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_HEADER, true);
$url = curl_exec($ch);
curl_close($ch);
echo $url;



All I need is the final URL. Don't need any content etc. I just want to "FOLLOWLOCATION" as many times as needed, and just report the ultimate destination URL.

Thanks for any advice!

phranque

6:24 am on Aug 11, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



have you tried something like this?
... $lastUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); echo $lastUrl; ...

craig1972

6:46 am on Aug 11, 2010 (gmt 0)

10+ Year Member



Thanks, but that doesn't work.

From the manpages:


CURLINFO_EFFECTIVE_URL (string)
Returns the effective URL as used in the most recent operation.


Which means it's not really the final URL. It's just the most recently used URL.

Here's my code:

$url = 'http://frxcvee.com'; 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$lastUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo "Original: $url <br>Final: $lastUrl";
exit;


This prints the following:


Original: http://frxcvee.com
Final: http://frxcvee.com

craig1972

6:47 am on Aug 11, 2010 (gmt 0)

10+ Year Member



Oh wait, the "exec()" was missing. It works! THANKS! :)

phranque

7:59 am on Aug 11, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



cool! you're welcome!

craig1972

4:06 pm on Aug 13, 2010 (gmt 0)

10+ Year Member



Hi, hope it's ok to ask this forum of experts for an additional idea. I implemented the above code and it works like a charm. But now I'm finding some spammers who enter a URL which has a framed website. The frameset has two frames: *,0 -- which means the entire #*$! site is in the first frame, and the second frame is nothing.

Because it's a framed site, the domain name does not show up as a spammed one. Is there any smart way to check if a site is framed and then see if one of the frames in it is spammed?

Thx!

enigma1

2:11 pm on Aug 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe you can never tell the final url. Even if you retrieve the location header or parse a doc for the meta-refresh there are still plenty of methods with scripting a site can do things and effectively redirect and even check who does the request and redirect on a case by case basis.

For example I can check on my server if your IP responds to port 80 and not redirect in that case.