Forum Moderators: open

Message Too Old, No Replies

Tamu_cs_irl_crawler/1.0

another edu one

         

bull

12:32 pm on Aug 14, 2004 (gmt 0)

10+ Year Member



from 128.194.135.80 . No info page indicated in the UA. If you do a http on this IP though, you get a "description" page which says:

IRL-crawler is a Texas A&M research project sponsored by the National Science Foundation that investigates algorithms for mapping the topology of the Internet and discovering the various parts of the web. The crawler downloads random web pages (text only) and follows randomly selected links to find other websites. With the exception of public traceroute servers, each website is visited only once in the entire execution of the program.

No mention of robots.txt, which was not fetched anyway. Indeed one visit so far with "/".

This site can be properly viewed only in MS Internet Explorer 6.0+.

Impudent.

bull

11:10 pm on Mar 1, 2005 (gmt 0)

pendanticist

11:55 pm on Mar 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also: [webmasterworld.com...]

volatilegx

3:52 pm on Mar 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



IRL crawler's official page [irl.cs.tamu.edu]

With the exception of embedded URLs, the text of the downloaded web pages (including any personal information) is not remembered, indexed, or used for any purposes.

surfin2u

12:53 pm on Mar 18, 2005 (gmt 0)

10+ Year Member



My site's been crawled by irlbot lately. I checked their info page and it claims they take only 1 page per minute. It took my home page twice in the span of 2 seconds this morning. I put an entry in robots.txt to exclude them. Will see if that works.