Welcome to WebmasterWorld Guest from 54.159.50.111

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

blocking robots using apache / php

     
9:26 am on Jan 22, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 23, 2002
posts:110
votes: 0


Hi All,

I've recently had a huge crawl by a number of spambots on my sites. I need to start blocking them.. Wanted to consult with you about it.
A few years ago, on a java based project I was involved at, we solved this by monitoring number of requests per minutes of certain pages, and for every unfriendly useragent / ip, if the number of requests per min met a certain threshold, it was assumed that this host is a hostile bot- and a the host was blocked with a captcha page- the host was unblocked only when it passed the captcha test.

Now, I'm working oh php platform, and I wouldnt want to go through the hassle of re-developing the entire mechanism in php- plus, since it's been a few years, i thought that something like this must exist :-)

I wanted to ask if anyone knows on an apache module / script that does something similar to a site-
ie- identify hostile bots, and preset them with captcha tests or otherwise block them.

many thanks!

11:02 am on Jan 22, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 15, 2004
posts:941
votes: 0


Well you could try to add something like the following to your htacces page

RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC]
RewriteRule ^.*$ - [F]

Take notice of the OR in the first line.
All bots you block, must have the [NC, OR] at the end, except the last one.

Now all you must do is to find a list of bots to block.. just g for this and you will get many info.

Best of luck.

11:14 am on Jan 22, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 28, 2002
posts:505
votes: 0


a mechanism based on the number of requests per timeframe and implemented in PHP for apache/unix is described here:

[webmasterworld.com...]
[webmasterworld.com...]

Kind regards,
R.