Forum Moderators: open

Message Too Old, No Replies

Ill-behaved libwww-perl UA

Reads and ignores robots.txt

         

jdMorgan

4:49 am on Sep 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This UA filled my log with 404s: It requested robots.txt and ignored it, attempted to load files in disallowed subdirectories, and requested many, many non-existent (have-never-existed) files.

Requests were made from IPs in the range 209.237.232.17 - 209.237.232.24
Their assigned netblock is much larger.

209.237.232.24 - - [15/Sep/2002:23:13:47 -0400] "GET /robots.txt HTTP/1.0" 200 857 "-" "libwww-perl/5.53"

Another "service company" with a poorly-written script.

Jim

andreasfriedrich

10:22 am on Sep 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since it is very very easy to write a nice robot in perl using the LWP::RobotUA class, whose constructor requires a name for your robot (while this UA has the standard LWP::UserAgent class name "libwww-perl/5.53"), it seems that that UA you encountered was not intended to be a nice robot.

jdMorgan

4:15 pm on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Doh!

This was ia_archiver, apparently following a 301-redirect from a very old domain. I saw it getting 403'ed in the logs this morning, this time using its correct User-agent string.

Please disregard this report. When I'm wrong, I say I'm wrong... :(

Jim