Forum Moderators: open

Message Too Old, No Replies

libwww

spambot or what?

         

coyote

4:28 am on Jun 20, 2003 (gmt 0)

10+ Year Member



I've seen this libwww critter in my log files a few times, most recently today. Anyone know what it is and should I block it?

jdMorgan

5:04 am on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



coyote,

Welcome to WebmasterWorld [webmasterworld.com]!

libwww a general-purpose application library for retrieving http documents. As such, it can be good or bad. If it comes from a search engine IP address, it's being used for some extended search service such as document language translation by Google or Altavista for example. Some of the GoogleLabs tools use libwww and python.

Be careful blocking this one, unless you include the IP address range in the equation.

Jim

wilderness

5:54 pm on Jun 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Artful net of France has been active recently.

No problem on my end however Deaspair for Dave ;)

Don

rbs10025

11:09 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



Main thing I see libwww used for are (a) the Lynx web browser, and (b) an About.com crawler. There are sporadic appearances of libwww in other usage.

The About.com case seems to be link checking (for links from pages on their site), but it's behavior can sometimes be a bit weird. For example, today it asked for a certain directory index, and not finding it started appending filenames...

mxc1s.about.com - - [21/Jun/2003:15:45:32 -0400] "GET /research/d02/ HTTP/1.1" 404 305 "-" "Libby_1.1/libwww-perl/5.65"
mxc1s.about.com - - [21/Jun/2003:15:45:32 -0400] "GET /research/d02/index.html HTTP/1.1" 404 315 "-" "Libby_1.1/libwww-perl/5.65"
mxc1s.about.com - - [21/Jun/2003:15:45:33 -0400] "GET /research/d02/index.htm HTTP/1.1" 404 314 "-" "Libby_1.1/libwww-perl/5.65"
mxc1s.about.com - - [21/Jun/2003:15:45:33 -0400] "GET /research/d02/index.cgi HTTP/1.1" 404 314 "-" "Libby_1.1/libwww-perl/5.65"
mxc1s.about.com - - [21/Jun/2003:15:45:33 -0400] "GET /research/d02/ HTTP/1.1" 404 305 "-" "Libby_1.1/libwww-perl/5.65"

jdMorgan

11:30 pm on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



rbs10025,

Thanks for posting those About.com log entries. Do you have the full UA for a Lynx browser access? (I've been blocking libwww pretty severely, but I'd like to allow Lynx.)

Thanks,
Jim

rbs10025

11:38 pm on Jun 21, 2003 (gmt 0)

10+ Year Member



The Lynx browser should have "Lynx" at the start of UA tag. From today's office server log...

Lynx/2.8.3dev.8 libwww-FM/2.14FM

Lynx/2.8.4pre.5 libwww-FM/2.14

Lynx/2.8.4rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6g

And also from today's log, the only non-Lynx and non-About.com usage of libwww...

Mozilla/5.0libwww-perl/5.48 - from wesleyan.edu

NFRLinkChecker/0.9 libwww-perl/5.64 - from ravn.no

libwww-perl/5.64 - from sfc.keio.ac.jp

W3C-checklink/2.89 libwww-perl/5.65 - from ee.ethz.ch

W3C_Validator/1.305.2.12 libwww-perl/5.64 - me using the W3C HTML validator

jdMorgan

12:31 am on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks!

Jim

coyote

4:09 am on Jun 22, 2003 (gmt 0)

10+ Year Member



Thanks for the replies and log examples.
The instance of libwww in my files was simply "libwww" and it wasn't from a Google IP or anything.
I erased my log after reading Jim's first post, but next time I see it (and I'm sure I will) I'll try blocking it by IP and not name, to make sure SEs, Lynx, etc. can still get in.