Forum Moderators: open
I did a tracert to archive.org but I thought they used a different UA? I also have had a lot more requests from these ip(s)!
IA uses libwww to check to see if they're allowed to display your graphics as I recall. So, they're probably grabbing your robots.txt to see if you've Disallowed your graphics or the subdirectories your graphics reside in.
I generally block libwww-perl accesses, but I allow AltaVista, Inktomi, and IA Archiver to use it to access my sites because they use it legitimately.
Similarly, I block Java and Python URLlib, but allow Google to use them because several of the Google Labs tools use those User-agents.
My main concern is e-mail scrapers and site downloaders, just due to the wasted bandwidth, but I have no problem with legitimate users of these sometimes-too-powerful utilities.
# Block libwww-perl except from AltaVista, Inktomi, and IA Archiver
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC]
RewriteCond %{REMOTE_ADDR}!^209\.73\.(1[6-8][0-9]¦19[01])\.
RewriteCond %{REMOTE_ADDR}!^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteCond %{REMOTE_ADDR}!^209\.237\.23[2-5]\.
RewriteRule !^403.*\.html$ - [F]
#
# Block Java and Python URLlib except from Google
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR}!^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteRule !^403.*\.html$ - [F]
HTH,
Jim
I found these two entries in my logs, I think, indicating successful hits upon libww-perl requests.
copper.webfusion.co.uk - - [15/Mar/2008:03:38:35 +0000] "GET //index2.php=http://www.pressurekru.co.uk/images/profile/jpg.txt? HTTP/1.1" 200 666 "-" "libwww-perl/5.68"
dublin.clusterspan.net - - [15/Mar/2008:03:41:03 +0000] "GET //index2.php=http://www.pressurekru.co.uk/images/profile/jpg.txt? HTTP/1.1" 200 665 "-" "libwww-perl/5.805"
Am I correct in assuming the above two hits were not blocked despite having the following lines in my .htaccess file?
RewriteEngine On
#
#
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]
#
#
RewriteRule ^.*$ 403.php [L]
Thank you for any advice.
Thank you.
# Block libwww-perl except from AltaVista, Inktomi, and IA Archiver
#
RewriteEngine On
#.....
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR} !^207\.126\.2(2[4-9]¦3[0-9])\.
RewriteCond %{REMOTE_ADDR} !^209\.73\.(1[6-8][0-9]¦19[01])\.
RewriteCond %{REMOTE_ADDR} !^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteCond %{REMOTE_ADDR} !^209\.237\.23[2-5]\.
RewriteCond %{REMOTE_ADDR} !^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.
#....
RewriteRule !^403.*\.php$ - [F]
Am I correct in assuming the above two hits were not blocked despite having the following lines in my .htaccess file?
Best practice is to check your error logs.
The lines have resulted in 200, however, what is the file size of "403.php when somebody access that file?
Compare that file to the 200 file size in these lines, because you are after all redirecting to that page.
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
In addition, in order for the above to function?
you either escape the hypen or inlcude the entire phrase (as well as hypen) in quotes.
Don
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]
You may solve all this with a single line:
RewriteCond %{HTTP_USER_AGENT} (libwww¦perl¦Indy¦python¦urllib¦java) [NC]
Make sure to correct the pipe characters that are broken by the forum.
Don
$ua->agent("My User Agent");
If they aren't serious enough to add that line why should I take them seriously? ;)
FYI, lazy botnet probes use the default user agent so blocking libwww-perl is a matter of basic site protection.
[webmasterworld.com...]
[edited by: incrediBILL at 1:02 am (utc) on Mar. 17, 2008]
FWIW, I always block libwww-perl no matter who's using it because anyone serious about using a Perl script can identify themselves with one lousy line of code:$ua->agent("My User Agent");
If they aren't serious enough to add that line why should I take them seriously? ;)
FYI, lazy botnet probes use the default user agent so blocking libwww-perl is a matter of basic site protection.
[webmasterworld.com...]
I agree 100% with Bill's assessment.