Forum Moderators: coopster

Message Too Old, No Replies

Blocking Proxies - Nice Little Code

         

internetheaven

10:00 pm on Mar 14, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Found this code on the web:

<?php if(@fsockopen($_SERVER['REMOTE_ADDR'], 80, $errstr, $errno, 1))
die("Proxy access not allowed"); ?>

Which finally killed that damned HMA "Proxy" website which was stealing all our content. (I use the word "proxy" loosely as for all intents and purposes they are a content theft site as the proxy pages are kept and left for search engines to index and rank.)

Anyway, it only works on pages with .php but not if they have anything AFTER the .php

e.g. /folder/page.php?type=45

How do I get this code to function when parameters are being passed?

Thanks
Mike

incrediBILL

12:04 am on Mar 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That solution won't solve all your proxy problems, especially not the search engine crawling and indexing via a proxy.

The only true solution to solve search engines crawling via a proxy is to validate that the search engine is actually coming from valid search engines IP addresses and block them otherwise.

Use full trip rDNS checking such as with Google, Bing, Yahoo, etc.

IP -> rDNS -> IP

$ip = $_SERVER['REMOTE_ADDR'];
$rDNS = gethostbyaddr( $ip );
$verifiedip = gethostbyname( $rDNS );

if ( $ip = $verifiedip ) {
if ( substr($rDNS, -14) == '.googlebot.com') {
echo 'This really is Googlebot';

If you don't end up with the same IP, it's not really Googlebot crawling from an authorized and verifiable location.

And to those that claim rDNS lookup is too slow, HOGWASH, cache the result per IP and read it from your cache on subsequent visits. Hold it for 24 hours or more as the SE's don't change IPs very often.