Forum Moderators: open

Message Too Old, No Replies

sixapart using libwww-perl

         

Pfui

4:42 am on May 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



oak-out.sixapart.com
libwww-perl/5.834

robots.txt? NO

From Bill's:

Default User Agents of Programming Libraries and Command Line Tools
Resource page for common user agents
[webmasterworld.com...]
>>
libwww-perl
Example: "libwww-perl/5.805"
This is a general-purpose application library for retrieving HTTP documents used by PERL scripts typically from Linux servers. This particular library is often associated with hacking attempts and botnet attacks [webmasterworld.com] and should be blocked in general.
<<

Pfui

5:08 pm on Sep 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That was then. This is now:

oak-out.sixapart.com
ArcheType/1.0

09/12 09:43:41 /
09/12 09:45:08 /

robots.txt? NO

Pfui

3:38 am on Oct 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Déja vu all over again. :)

oak-out.sixapart.com
libwww-perl/5.834

robots.txt? NO

rIP for just for "oak-out.sixapart.com": 204.9.180.100 - 204.9.180.209
See: [robtex.com...] (link currently timing out; info found via G cache.)

enigma1

12:06 pm on Oct 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was wondering why would you expect any access from the libwwws to read robots.txt, other than scraping content and hacking servers they're not doing anything else.

Pfui

5:34 pm on Oct 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Generally speaking, I don't. But some of the command-line tools -- Wget, curl, etc. -- can read-and-heed respect robots.txt, even if the feature is all too often overridden. Anyway, because this is a spider forum and spotting/discussing bad spiders/bot-runners/hosts is its raison d'être (more or less), I include the "robots.txt?" info with every report.