I have a personal hobby site which has a reasonably decent readership because the content, photography related, is pretty helpful for others. The site is basically a few hundred articles about how to do a certain type of photography, and has the ability for people to add their comment to every article. It's not the busiest site in the world but it nevertheless has had a history of attracting a lot of China [GET] then [POST] bots, I guess trying to add spam using the submit comment thing. 75% are China, but there's Ukraine and some other suspects in there as well.
Their attempts to add their comment always fails because of the approval and CAPTCHA setup. Nevertheless they are annoying, because they have so many page attempts that they skew a bespoke 'most popular pages' widget I customised. My bots literally latch onto non-significant pages in waves of 100s and 100s during a day, sometimes 1000s for no apparent reason (ps I think this maybe why the BBC shows weird old stories in its most popular widget as well).
I have a tried a lot of different ways to do something about them but I finally seem to have seized onto something that appears to be reducing their interest in my site, to the extent that I'm now only seeing 1% of the visits that I did a month ago. I'm not putting this out as a solution for everyone or the ideal solution, far from it, but what I'm interested in is seeing if this is coincidence and based on something else entirely, or seeing if this works for others. I've always thought there were no real ways to stop bots coming to your site, but if this does lessen their interest, then surely it can only be good for people.
My approach is twofold. First, my [GET][POST] bots all have no referrer and a Mozilla/4.0 user agent, so I have a block for this in .htaccess:
# Chinabots (Moz4) - tied together
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0
RewriteCond %{REQUEST_URI} !.*\.ico
RewriteCond %{HTTP_REFERER} ^$
RewriteRule .* 1.php[L]
The other block I employ is one that looks for non-referrer straightforward POST attempts:
# Otherbots (Moz5) - post but no referrer. Just die.
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_REFERER} ^$
RewriteRule .* 1.php [L]
The effect of both these blocks is to intercept and redirect them to 1.php, with no further rules applied ([L]). But it is the file 1.php that is also interesting. 1.php is completely empty, and all that it contains is:
<?php
header("HTTP/1.1 403 Forbidden");
?>
I think there's something in this, it seems to be working for me. In my logfiles there's a 0 byte response with this, so the bot is completely starved of information & routes onward, and there's no data cost either. I just wonder if the software behind this sort of bot might not have some programmatic problem handling a 0 byte response, or if this disrupts their activity sufficiently for the bot to stop trying, fingers crossed.
Granted this would be terrible if it were a real person, as you're just serving up a blank white page and usually we like to have a nice fancy error page that suggests some options for people. Having spent a year with one of the fancy ones with links to everywhere, I've had precisely one user tell me of a problem, a Moz4 user accessing bookmarks (the downside of this), and the bots keep hammering away, so keeping this is a choice I'm choosing to make.
For the record, the things that didn't reduce their interest:
- redirecting to fancy 403 page with links to site
- redirecting to fancy 404 page with links to site
- 400 response
- inpage PHP block
- .htaccess IP CIDR span blocks
Last note on CIDR IP span blocks. I gave up when I reached 200 - block one range and they just come from another. This method really was whack a mole for me, and if anything increased the number.