Forum Moderators: phranque
Look at your raw access logs (stats are pretty useless for this kind of project) and get the exact user-agent name that is scraping your site, the IP address(es) it's coming from, and any other info that may be useful in blocking it in a precise manner.
blocking by IP and user-agent name is covered extensively (some say "excessively") in the long-running four-part A Close to perfect .htaccess ban list [webmasterworld.com] threads.
Either mod_rewrite or a combination of mod_access and mod_setenvif can be used to solve this problem.
Jim
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} libwww-perl/5\.803
RewriteRule ^ - [F]
Not sure precisely what each part does, though, or how to fit in the exception for my own site's IP.
Btw, will this conflict with my other rewrites, or does each rewrite rule finish up each section?
RewriteEngine on
RewriteCond %{REMOTE_ADDR}!^192\.168\.0\.1$
RewriteCond %{HTTP_USER_AGENT} libwww-perl/5\.803
RewriteRule .* - [F]
Jim