About/0.1libwww-perl/5.47

Forum Moderators: open

Message Too Old, No Replies

About/0.1libwww-perl/5.47

Marcia

4:37 am on Jan 19, 2002 (gmt 0)

This comes around regular like clockwork, doesn't miss a week.

209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/ HTTP/1.0" 404 211 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/index.html HTTP/1.0" 404 221 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET //index.htm HTTP/1.0" 404 220 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/index.cgi HTTP/1.0" 404 220 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/ HTTP/1.0" 404 211 "-" "About/0.1libwww-perl/5.47"

Interesting that it's looking for those variations of file extensions, but what I'm wondering is what that double forward slash is - "GET //index.htm HTTP/1.0"

littleman

5:26 am on Jan 19, 2002 (gmt 0)

My first guess would be a poorly written link extraction script that isn't parsing properly.

wilderness

6:51 am on Jan 19, 2002 (gmt 0)

that bot is persistent.
I get this at least a dozen times a week.
Sometimes twice a day
209.143.212.233 - - [02/Jan/2002:19:57:35 -0800] "GET / HTTP/1.0" 403 - "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [02/Jan/2002:19:57:35 -0800] "GET /index.html HTTP/1.0" 403 - "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [02/Jan/2002:19:57:35 -0800] "GET /index.htm HTTP/1.0" 403 - "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [02/Jan/2002:19:57:35 -0800] "GET /index.cgi HTTP/1.0" 403 - "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [02/Jan/2002:19:57:35 -0800] "GET / HTTP/1.0" 403 - "-" "About/0.1libwww-perl/5.47"

Marcia

7:18 am on Jan 19, 2002 (gmt 0)

wilderness, is there a link to your site from about.com? I'm really not sure what that's doing.

wilderness

6:34 pm on Jan 19, 2002 (gmt 0)

Hey Marcia,
Yes there is a link from about to my site. From Cindy Piersons horse page.
Cindy Persons pages are primarily Thoroughbred horses. My site is 99.9999 % Standardbreds.

After multiple requests and a full year About/Cindy provided a link to my site. When they did it was FRAMED.
I ask Cindy & About to remove my URL from their pages. They inquired as to WHY?
Even after the frame was removed I still wasn't happy with the presentation.
The easiest solution was to deny.

The most pecuilar thing happend as a result :-(
Within hours I was besieged by bots related to About/Global Crossing/Thunderstone/Road Runner

amoore

7:59 pm on Jan 19, 2002 (gmt 0)

that's pretty much the default user-agent string you get when you use standard perl modules to pull web pages. If you use perl to make a script that pulls web pages from somewhere, and you don't do anything special to set a user-agent, odds are you're using LWP and it will call itself something like "libwww-perl..." it's pretty common to see them from "amateur" bots or scripts like that. My logs are full of access from "libwww-perl/5.49" and similar user-agents.

wilderness

12:03 am on Jan 20, 2002 (gmt 0)

My logs are full of denies from
["amateur" bots or scripts like that. My logs are full of access from "libwww-perl/5.49" and similar user-agents.]

As I mentioned earlier, IMO it is not very professional to expect free access to somebody's extensive effort without providing a defined (URL)reason and intention of use for any uninvited bot spidering.

Not much difference between the above kind of bots and addresses.com mentioned in an adjoining discussion.
Although these bots do tend to read and sometimes abide by robots.txt.