homepage Welcome to WebmasterWorld Guest from 54.166.108.167
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Http Referer
dcool86




msg:4405023
 9:56 pm on Jan 8, 2012 (gmt 0)

Hello I have a website that logs referrers using the code below

$ref = $_SERVER['HTTP_REFERER'];

On my site it shows the top websites that have been referering the most.

The problem I'm having is people are using the codes below to cheat the system.

<div style="width:0;height:0;"><img style="width:0;height:0;" src="http://example.com"></div>

and

<script type="text/javascript" src="http://www.example.com">


The problem is when people visit there site it treats it like the referred someone to my site. I have to manually block the websites. Is there a way to prevent people from doing this?


Thanks

 

penders




msg:4405039
 10:36 pm on Jan 8, 2012 (gmt 0)

Maybe you could fetch the referring page using CURL and validate it by making sure that the reference to your site is not in one of these known 'cheating' code blocks? Or maybe it is sufficient to just make sure that the link to your site does not appear within a src attribute - presumably it should only occur within an href attribute of an anchor? You could also penalise them if they have included the rel="nofollow" within the anchor?

dcool86




msg:4405059
 11:54 pm on Jan 8, 2012 (gmt 0)

I tried hot link protection then realized there not requesting for a specified file type. Is there a way I can block it in .htaccess?

Finger




msg:4405102
 2:58 am on Jan 9, 2012 (gmt 0)

I'm pretty sure the only way to effectively block the domains that are trying to cheat is to do what penders suggested and actually check the referring page. Here is some working code to at least get you started if you haven't already.


$ref = $_SERVER['HTTP_REFERER'];

$your_domain = "yourdomain.com";
$ua = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)";

if ( strlen(trim($ref)) ) {
// if the referrer string is not empty, fetch the url
$html = file_get_curl($ref,$ua);

if ( $html ) {
// check the source for your domain name in SRC for images and scripts
if ( is_host_in_src($your_domain,$html) ) {
// if your hostname was found in a SRC attribute, blank the referer.. or handle it however you'd like.
$ref = "";
}
}
}

// Helper functions below

function is_host_in_src($host,$html){
$src_array = src_extract($html);
foreach ($src_array as $src){
if ( stristr($src,$host) ) {
return true;
}
}
return false;
}

function src_extract($html){
// extracts src urls from html document and returns them in an array
$preg = "/ src=(\"|')(.*?)(\"|')/i";

$subs = array();
preg_match_all($preg,$html,$subs);

$num_src = sizeof($subs[0]);

$src_array = array();
for ($i=0;$i<$num_src;$i++){
$src_array[] = $subs[2][$i];
}
return $src_array;
}

function file_get_curl($url,$ua=""){
$ch = curl_init();

$timeout = 10; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $ua);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_TIMEOUT, $timeout);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_MAXREDIRS, 4);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_ENCODING , "");

$content = curl_exec($ch);
curl_close($ch);
return $content;
}


You may want to store trusted domains in a database table so you don't have to check them every time. Also, you probably don't want to be fetching extra external URLs every time a user requests a page, so the best thing would probably be to just store whatever referrer string is given, then have a separate script that processes the list before being put into your stats database.

dcool86




msg:4405104
 3:14 am on Jan 9, 2012 (gmt 0)

I will give it a shot thank you.

enigma1




msg:4405173
 12:31 pm on Jan 9, 2012 (gmt 0)

The 'HTTP_REFERER' is not a trusted variable. So it can be manipulated, don't rely on it. Doing curls or fsockopens will add latency to your server. If they want to manipulate it and still give out a fake hard-coded link they can do so in the page you try to parse.


<div><script language="javascript" type="text/javascript">
var fakelink = '<a href="http://www.example.com">www.example.com</a>';
</script></div>

Spiders may not treat it as a link but you will. Now you gonna need more than a simple parser to detect it. And there are other ways to fake it.

penders




msg:4405315
 6:59 pm on Jan 9, 2012 (gmt 0)

<div><script language="javascript" type="text/javascript"> 
var fakelink = '<a href="http://www.example.com">www.example.com</a>';
</script></div>


Although this particular example won't result in the referer being logged at example.com. But yes, it is still possible that the site could 'fake it' and even send you legitimate content when you parse it.

If you do choose to attempt validation by looking up the referring site then you would only do this periodically when building your "top-website" report. And storing "validated" and "last-validated" fields.

enigma1




msg:4405342
 8:40 pm on Jan 9, 2012 (gmt 0)

Sorry I wasn't clear, the example was to fool a regular parser not to set the referrer, the referrer part could still be the same as in the OP.

dcool86




msg:4405354
 8:45 pm on Jan 9, 2012 (gmt 0)

What I think I'm going to do is make it where I can do it a manually with the script above. I'll have it loop through the list of all the domains and echo out something if something is found. I will put a sleep function on it so it doesn't over work my server CPU.


Thank you all!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved