Welcome to WebmasterWorld Guest from 54.145.208.64

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Http Referer

   
9:56 pm on Jan 8, 2012 (gmt 0)



Hello I have a website that logs referrers using the code below

$ref = $_SERVER['HTTP_REFERER'];

On my site it shows the top websites that have been referering the most.

The problem I'm having is people are using the codes below to cheat the system.

<div style="width:0;height:0;"><img style="width:0;height:0;" src="http://example.com"></div>

and

<script type="text/javascript" src="http://www.example.com">


The problem is when people visit there site it treats it like the referred someone to my site. I have to manually block the websites. Is there a way to prevent people from doing this?


Thanks
10:36 pm on Jan 8, 2012 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Maybe you could fetch the referring page using CURL and validate it by making sure that the reference to your site is not in one of these known 'cheating' code blocks? Or maybe it is sufficient to just make sure that the link to your site does not appear within a src attribute - presumably it should only occur within an href attribute of an anchor? You could also penalise them if they have included the rel="nofollow" within the anchor?
11:54 pm on Jan 8, 2012 (gmt 0)



I tried hot link protection then realized there not requesting for a specified file type. Is there a way I can block it in .htaccess?
2:58 am on Jan 9, 2012 (gmt 0)

10+ Year Member



I'm pretty sure the only way to effectively block the domains that are trying to cheat is to do what penders suggested and actually check the referring page. Here is some working code to at least get you started if you haven't already.


$ref = $_SERVER['HTTP_REFERER'];

$your_domain = "yourdomain.com";
$ua = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)";

if ( strlen(trim($ref)) ) {
// if the referrer string is not empty, fetch the url
$html = file_get_curl($ref,$ua);

if ( $html ) {
// check the source for your domain name in SRC for images and scripts
if ( is_host_in_src($your_domain,$html) ) {
// if your hostname was found in a SRC attribute, blank the referer.. or handle it however you'd like.
$ref = "";
}
}
}

// Helper functions below

function is_host_in_src($host,$html){
$src_array = src_extract($html);
foreach ($src_array as $src){
if ( stristr($src,$host) ) {
return true;
}
}
return false;
}

function src_extract($html){
// extracts src urls from html document and returns them in an array
$preg = "/ src=(\"|')(.*?)(\"|')/i";

$subs = array();
preg_match_all($preg,$html,$subs);

$num_src = sizeof($subs[0]);

$src_array = array();
for ($i=0;$i<$num_src;$i++){
$src_array[] = $subs[2][$i];
}
return $src_array;
}

function file_get_curl($url,$ua=""){
$ch = curl_init();

$timeout = 10; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $ua);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_MAXREDIRS, 4);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_ENCODING , "");

$content = curl_exec($ch);
curl_close($ch);
return $content;
}


You may want to store trusted domains in a database table so you don't have to check them every time. Also, you probably don't want to be fetching extra external URLs every time a user requests a page, so the best thing would probably be to just store whatever referrer string is given, then have a separate script that processes the list before being put into your stats database.
3:14 am on Jan 9, 2012 (gmt 0)



I will give it a shot thank you.
12:31 pm on Jan 9, 2012 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



The 'HTTP_REFERER' is not a trusted variable. So it can be manipulated, don't rely on it. Doing curls or fsockopens will add latency to your server. If they want to manipulate it and still give out a fake hard-coded link they can do so in the page you try to parse.


<div><script language="javascript" type="text/javascript">
var fakelink = '<a href="http://www.example.com">www.example.com</a>';
</script></div>

Spiders may not treat it as a link but you will. Now you gonna need more than a simple parser to detect it. And there are other ways to fake it.
6:59 pm on Jan 9, 2012 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



<div><script language="javascript" type="text/javascript"> 
var fakelink = '<a href="http://www.example.com">www.example.com</a>';
</script></div>


Although this particular example won't result in the referer being logged at example.com. But yes, it is still possible that the site could 'fake it' and even send you legitimate content when you parse it.

If you do choose to attempt validation by looking up the referring site then you would only do this periodically when building your "top-website" report. And storing "validated" and "last-validated" fields.
8:40 pm on Jan 9, 2012 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Sorry I wasn't clear, the example was to fool a regular parser not to set the referrer, the referrer part could still be the same as in the OP.
8:45 pm on Jan 9, 2012 (gmt 0)



What I think I'm going to do is make it where I can do it a manually with the script above. I'll have it loop through the list of all the domains and echo out something if something is found. I will put a sleep function on it so it doesn't over work my server CPU.


Thank you all!