Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: phranque
new to webmastering, relatively speaking. Have just taken over a neglected site. the previous wm did nothing to protect and obviously got infected with huge amounts of spam-emails etc.
Now we want to update the site and would like to make sure that the spambots don't have an easy time of it.
First, where I've been.
altered the .htaccess files to deny known user-agents. Tried to find the most updated list. Somewhat of a challenge, as I know little about this whole business.
Found this post while googling. It is somewhat outdated (2001)
Would like to know if you think this is a good method for locking out the naughty beasts. It involves setting a trap in /robots.txt file, a non-existent directory, and then logging which bots specifically ignore the "ignore"-request and go for that, then using a script to deny access to the requesting IPs. Quoted here:
Stopping the most pernicious and egregarious spiderts can be easy though:
1. use some tool that does what mod_rewrite does on your server
2. insert the DISALLOW /email_addresses/ line into your robots.txt file
3. every time some visitor requests that explicitly disallowed directory you rewrite the request to a cgi that logs their IP address
4. and finally you configure your htaccess/mod_rewrite files to deny access to any visitor whose IP address is in that log file.
Thus the spidert is kick/banned instantly, rather than much later when you get around to perusing your log files ... by which time its too late
If you think this is a good method for a small community site, would love to have suggestions towards implementation. I know nothing of cgi and little of apache server, but learning fast.
What do you all think? Is this a good method? If not, what are people doing these days?
So in addition to putting a trap in robots.txt, it's a good idea to use several other methods as well. See this thread [webmasterworld.com] for a PERL-based solution, and this thread [webmasterworld.com] for a PHP-based solution (you can use both).