Forum Moderators: open

Message Too Old, No Replies

csci_b659/0.13

         

wilderness

3:27 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



156.56.103.16 - - [22/Mar/2006:05:16:26 -0800] "GET /robots.txt HTTP/1.1"
206 3688 "-" "csci_b659/0.13"

Pfui

8:05 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"csci_b659 [informatics.indiana.edu]" hails from .ucs.indiana.edu

I recommend keeping an eye on it -- prior experience with that school's Web Mining students indicates that their numerous projects [informatics.indiana.edu] may not be as well-behaved as they initially present. E.g.:

Spring, 2006
1. Usage Statistics of Robots Exclusion Standard
[Direct link not included because it's to a blog and links to blogs violate TOS.]

"Crawl the URLs in the robots.txt. This would violate the robot exclusion standards but..."

("But"? I don't think so.)

Staffa

11:03 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I noticed the same today, robots.txt 206 and straight for the index page.
Banned them like all other Uni crawlers just a waste of band width - should they later come up with the best SE since sliced bread they can always come back and try again ;o)