Forum Moderators: open
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/ HTTP/1.0" 404 211 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/index.html HTTP/1.0" 404 221 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET //index.htm HTTP/1.0" 404 220 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/index.cgi HTTP/1.0" 404 220 "-" "About/0.1libwww-perl/5.47"
209.143.212.233 - - [18/Jan/2002:13:45:59 -0500] "GET /directory/ HTTP/1.0" 404 211 "-" "About/0.1libwww-perl/5.47"
Interesting that it's looking for those variations of file extensions, but what I'm wondering is what that double forward slash is - "GET //index.htm HTTP/1.0"
After multiple requests and a full year About/Cindy provided a link to my site. When they did it was FRAMED.
I ask Cindy & About to remove my URL from their pages. They inquired as to WHY?
Even after the frame was removed I still wasn't happy with the presentation.
The easiest solution was to deny.
The most pecuilar thing happend as a result :-(
Within hours I was besieged by bots related to About/Global Crossing/Thunderstone/Road Runner
As I mentioned earlier, IMO it is not very professional to expect free access to somebody's extensive effort without providing a defined (URL)reason and intention of use for any uninvited bot spidering.
Not much difference between the above kind of bots and addresses.com mentioned in an adjoining discussion.
Although these bots do tend to read and sometimes abide by robots.txt.