Forum Moderators: phranque
RewriteEngine On
RewriteRule ^robots\.txt$ - [L]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Opera [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Lynx [NC]
RewriteRule ^.* - [L]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC]
# longer list of blocked user agents here
RewriteRule ^.* - [F,L]
Would this be reasonable and serve the intended purpose of the WebReaper/etc list not being processed, or am I missing something basic?
I've used a browser whitelist, and it had 30 or 40 entries in it just to support the "popular ones" while rejecting most of the spoofers.
A combined whitelist/blacklist approach as you've shown is indeed the best way to keep the file samller, but the result will never be "small." Try to concentrate on the user-agents that actually abuse your site regularly; A fully-comprehensive ban list in your .htaccess file would likely slow your server to an unserviceable speed.
Look into using the PERL and PHP bad-bot trap scripts found here on WebmasterWorld as well; They will reduce your time spent reacting to abuse.
Jim
Slightly related... would I be correct in guessing from the "User-Agent: " prefix that the second one in the list below isn't actually a browser, but something else failing to spoof correctly?
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Thanks for your help. :)