Page is a not externally linkable
- Code, Content, and Presentation
-- PHP Server Side Scripting
---- PHP Spider Trap


isitreal - 4:22 pm on Mar 7, 2004 (gmt 0)


Birdman: I was following that thread too, also .htaccess spider blocking, like you I'm not comfortable with PERL, this looks like a really good solution.

Thanks for posting it, can you keep us updated if you find any problems with it?

====
footnote:
I started testing this, it works exactly as claimed, this thing is really nicely thought out, easy to implement. Re the permissions, they can be set to 404 for the getout.php file and 606 for the .htaccess file.

The group permissions only apply to other users of the server, and the execute permissions apply to viewing folder content, I think anyway. So all you need is read permissions on the getout.php file and read/write permissions on the .htaccess file, someone correct me if I'm wrong about that.

If this script does in fact add blocked ip addresses to the list, on a large site that might lead to some problems, since a spider could be using a dynamically assigned IP address, which would mean that some other user in the future might conceivably find themselves blocked, but I can't tell for sure if that's the case from testing it on just my IP address. Whatever the case, I'm definitely going to test this thing out and see if it starts catching spiders, it's very well thought out, elegant solution, much better than the .htaccess spider blocking lists I was playing with last year, those can be so easily fooled by just using a standard navigigator useragent string to id the spider.


Thread source:: http://www.webmasterworld.com/php/3104.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com