Forum Moderators: martinibuster

Message Too Old, No Replies

Blocking clickbots and spambots may increase Adsense earnings

Simple honeypot to block revisits

         

ian_D

5:01 am on Feb 7, 2015 (gmt 0)

10+ Year Member



I've been using this example on some of my sites for a while and find it very effective at reducing spambot and clickbot activity. It not only prevents wasted server resources by these bots, but it appears to have significantly increased rpm and reduced clawbacks by preventing invalid adsense clicks and impressions.

I'm hoping someone here has a little more layout experience than I have and can sanitize and simplify it for use as a global include.

It needs some work to eliminate the need to copy/paste the forms and css into each page. That should all be able to reside in the include file if valid html and inline css are used.

I prefer a hardcoded $datFile path but others may like getCWD().

I've had no issues on low to medium traffic sites, up to 3000 pageviews, but I have no idea if this will scale well for higher traffic sites. All it's really doing is checking IP against whatever is stored in the $datFile so it depends how large you allow that to get. I suppose you could rotate that file if it gets too large.

I didn't include dynamic writing to htaccess here because I've found it too unpredictable in the past.

Viewing the formatted deny list should probably be done in a separate page too.

Feel free to add improvements.


<?php

// Bad Bot Logging and Blocking

// This script captures all visitor IPs when the page loads but only records bot IPs
// based on hidden form submission that your visitors won't see but bots will.
// Bad bots will test any form they find and once they do, their IP will be logged
// and they will be blocked from reaccessing the page.
// The hidden bot form should be placed before any other interactive elements (eg: forms or ads) on your page.

// To then block them from your entire site, see the htaccess useage example further down

// Add the folling line to any page you want protected.
// include('./spam.inc.php');

// Place this code on the same page, near the top preferably.
// **Change the message a bit to avoid future bot detection and avoidance !


/*




<div id="non">
<form action="" method="post" target="_self">
<p>If you complete these fields, you will be added to our list of problem users.
In addition, this page will no longer work for you. Thanks for visiting yoursite.com
</p>
Email Address<br />
<input type="text" name="email" value="">
<br />
Contact<br />
<input type="text" name="contact" value="">
<br />
Comment<br />
<textarea cols="40" rows="6" name="comment"></textarea>
<br />
<input type="submit" value="Submit" />
</form>
</div>




*/

// Example css for hidden div:
/*

#non {
width:280px;
max-width:280px;
height:20px;
max-height:20px;
display:none;
}

*/

// IMPORTANT !
// IMPORTANT !
// IMPORTANT !

// Edit $datFile to the full path to your data file, in your Home directory.
// This file will be automatically created if it does not already exist


// IMPORTANT- must edit !

$datFile = '/home/*********/public_html/badIPs.dat';


// END OF EDITABLE FEILDS


// Is there a $datFile? Create new file if not already exists

if(!file_exists($datFile)){
fopen($datFile, "w");
}


// open $datFile and get contents into an array.

$data = file_get_contents($datFile);
$bad = explode("\n", $data);

// Used for debugging only. Uncomment to show all IPs captured in unsorted order
//print_r($bad);

// Get the visitor IP

$ip = GetIP();

// Are they a bot? Has it been here before?
// If it has, just kill the page.

if (in_array($ip, $bad)) {

// In case the bot is acting from cached page, clear form submitted data first

$keys = array();

foreach($GLOBALS as $k => $v){
$keys[] = $k;
}

for($t=1;$keys[$t];$t++){
unset($$keys[$t]);
}
unset($k); unset($v); unset($t);

unset($_POST);
unset($_GET);
unset($_REQUEST);

die("Spam Bot Behaviour Detected and Blocked");

} else {


$p = 0;

if(isset($_POST['email']) && ($_POST['email']) !==''){
$p++;
}

if(isset($_POST['contact']) && ($_POST['contact']) !==''){
$p++;
}

if(isset($_POST['comment']) && ($_POST['comment']) !==''){
$p++;
}

if($p !==0){

// if it's a new bot, write the IP to your data file and clear form data, kill page

$fp = fopen($datFile, 'a');
fwrite($fp, $ip."\n");
fclose($fp);

$keys = array();

foreach($GLOBALS as $k => $v){
$keys[] = $k;
}

for($t=1;$keys[$t];$t++){
unset($$keys[$t]);
}
unset($k); unset($v); unset($t);

unset($_POST);
unset($_GET);
unset($_REQUEST);

die("Spam Bot Behaviour Detected and Blocked");
}


}



// HTACCESS USEAGE EXAMPLE:

// To view all IPs formatted for htaccess in sorted order
// useage: http://www.yourwebpage.com/index.php?showbad=true

// ** Data will appear at the top of the page outside of your css.
// A bit crude but.....

if(isset($_REQUEST['showbad']) && ($_REQUEST['showbad']) =='true'){
natsort($bad);
echo "<pre>\n";
echo '#badIP\'s Last Updated: '.date(DATE_RFC2822)."\n\n";
echo 'order allow,deny'."\n";
echo 'allow from all '."\n";
foreach($bad as $badIP){
if($badIP !==''){
echo "deny from ". $badIP ."\n";
}
}
echo "\n".'#badIPs Last Updated: '.date(DATE_RFC2822)."\n\n";
echo "</pre>\n";
}



function GetIP()
{
if (getenv("HTTP_CLIENT_IP") && strcasecmp(getenv("HTTP_CLIENT_IP"), "unknown"))
$ip = getenv("HTTP_CLIENT_IP");
else if (getenv("HTTP_X_FORWARDED_FOR") && strcasecmp(getenv("HTTP_X_FORWARDED_FOR"), "unknown"))
$ip = getenv("HTTP_X_FORWARDED_FOR");
else if (getenv("REMOTE_ADDR") && strcasecmp(getenv("REMOTE_ADDR"), "unknown"))
$ip = getenv("REMOTE_ADDR");
else if (isset($_SERVER['REMOTE_ADDR']) && $_SERVER['REMOTE_ADDR'] && strcasecmp($_SERVER['REMOTE_ADDR'], "unknown"))
$ip = $_SERVER['REMOTE_ADDR'];
else
$ip = "unknown";
return($ip);
}
?>

aristotle

2:53 pm on Feb 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't understand how it works. You appear to be blocking the clickbots AFTER they try to access your website. But by then they've already done a fake click on the ad on the publisher site. So how does blocking them subsequently at your site prevent fake clicks? i don't understand it.

aristotle

3:06 pm on Feb 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops
I think I understand now. I was thinking about adwords. I don't use adsense, so that's why I got confused.

ian_D

3:32 pm on Feb 7, 2015 (gmt 0)

10+ Year Member



It's not a firewall. It's a honeypot that lets you build your own blocklist.
It does have the added benefit of blocking revisits until you update your htaccess with the captured IPs. Once they submit the form, the page (and ads) no longer appears to them.
Easier than trying to keep up manually watching for new bot ranges.

chocobo

12:24 am on Feb 8, 2015 (gmt 0)

10+ Year Member



I wish I knew how to implement this. I am sooo not code-saavy.

ian_D

1:34 am on Feb 8, 2015 (gmt 0)

10+ Year Member



That's why I'm hoping someone here can simplify installation.
I don't have the html experience to deal with how to insert valid html and div styles using just an include that will work with every site.
Most of my high earning sites are only 3 or 4 pages so I just use the copy/paste method to place the forms and css, then include() the script. I think it would be possible to echo the html with an include right below the <body> tag but I don't know how that scores with designers or html gurus.

Me=aspergers, so it's hard to explain it all...

Fyi, I currently have about 3000 captured bot IPs in my blocklist. This is since May 2014.

MrSavage

3:07 am on Feb 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I want to say thank you Ian for putting this up and mentioning your success. It's invaluable. Most often there aren't solutions discussed, but you have brought something big here. I do feel now, that I owe it to myself to wake up and deal with problem traffic. I know that it's major and I suppose I've been somewhat oblivious to its effects on my earnings potential. Google said our quality of traffic plays a role in earnings. I'm thick headed, what can I say. So there is no question, that for me, cleaning up my traffic is priority #1 right now. I've woken up to mobile, now it's time to deal with my traffic, once and for all. Thanks again for offering this up. I just need to figure it out.

Evan Salamanca

4:13 pm on Feb 8, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



You are awesome Ian. I applied it (after moving the DIV out of the PHP block) and since last night my IP list is up to ten. Hopefully this helps to reverse my CPC decline.

ian_D

9:17 pm on Feb 8, 2015 (gmt 0)

10+ Year Member



I think it'll be interesting to hear how many IPs each of us captures.
I'm sure with our differing content, traffic sources, backlinks, and geographical areas, we may attract different bots or different numbers of bots.

This capture form is actually quite conservative too. It's looking for bots that use autofill to test forms.
You could probably catch even more by not checking for empty submissions.
For example,
change:
if(isset($_POST['email']) && ($_POST['email']) !==''){
$p++;
}

To:
if(isset($_POST['email'])){
$p++;
}

Or you could maybe add another hidden form with an image button or image link. (using nofollow and noindex of course).

Depends how aggressive you want to be.

freitasm

1:51 am on Feb 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I think it is a good idea - similar to what some of us do to avoid comment/contact form spamming (hidden fields, if they have anything in them the comment comes from a bot therefore spam).

I will code something like this around my ASP/SQL-based forum to see how it goes. Perhaps not hide the content but hide the ads...

Will report back later.

freitasm

6:27 am on Feb 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Actually, after thinking a bit about it I won't do... The main reason is that our site is a tech forum. I am sure someone at some point will figure what those fields are for and start spoofing IP Addresses to lock out other users.

Better keep it quiet.

Evan Salamanca

7:38 am on Feb 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Already caught 22 bad IPs. This is kinda fun.

ian_D

7:29 pm on Feb 9, 2015 (gmt 0)

10+ Year Member



some point will figure what those fields are for and start spoofing IP Addresses to lock out other users.


If your users are that malicious, wouldn't you already be having issues with them locking you out of the server by tripping max login failure or whatever for your IP?
I guess it's ~possible~ but I think it's very unlikely. Unless you've Really pissed someone off?

The idea of hiding only the ads would be good in some situations.
Just replace the die() line with a true:false value and wrap your ads with an if() conditional.

freitasm

8:12 pm on Feb 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



We have a community with hundred thousand users and growing. We have tens of thousands of new posts every week in our forums.

Occasionally our moderators will ban one or two people every week or so - breach of our FUG (Forum Usage Guidelines) which include no spamming, no name calling and other things.

A couple of times in the last two years we have a DDoS, immediately after one user or another was banned.

So in answer to your comment our users are not malicious but, as in everything on the Internet, on occasion you face the odd "muppet".

But, yes since the number of users who need this treatment is low I decided to implement this last night - we have an ASP/SQL site so the code provided was adapted and released after some internal test.

In the last 12 hours we recorded 53 IP addresses behaving like this.

Interesting because our site is running behind Cloudflare, which in theory should remove this kind of traffic. Or perhaps those are new ones and Cloudflare is not blocking yet. I wonder if we removed Cloudflare how many more we would see.

not2easy

8:47 pm on Feb 9, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Some sites don't have a problem with forms, but they may be getting problems with other bots that ignore robots.txt. For those, there is an old cure (2004) and I can attest it works fine in alerting me to scrapers that haven't read/obeyed robots.txt I don't want to take this thread off its topic of ian_D's Spambot Solution so I will give you the link, but please start a different discussion if there are implementation questions. It works with .htaccess via a .php file. This is the old BadBots Trap from member Birdman: [webmasterworld.com...]

Also, some may not have seen it, but there's a whole forum here that shares information about blocking unwanted bots: [webmasterworld.com...]

For anyone who is capturing IPs from unwanted traffic with an eye to blocking, there are some general basics to follow:
    1. be sure that you aren't blocking visitors that you want on your site. Bots can visit from a residential ISP IP address and in those cases you might want to just block that one IP for a period of time.
    2. Normally, it is not efficient to block one IP address. If the IP belongs to a known offender, it is more efficient to block the entire CIDR range for that IP.
    3. The "403" page you send offenders to should include a way to appeal in case an errant human falls into the trap - and it needs to be designated in your .htaccess file.
    4. Don't rely solely on a script. No matter how good your script is, it will not catch every unwanted visitor. Get familiar with examining the raw access logs for your site and you can learn to block by UA and give malicious visitors a roadblock for what they are trying to do.

freitasm

10:34 pm on Feb 11, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Running for three days now and collected 250 IP addresses.