Forum Moderators: open

Message Too Old, No Replies

Internet Archive using Nutch

         

keyplyr

9:42 am on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



***.***.***.* - - [27/Sep/2005:15:33:19 -0700] "GET /robots.txt HTTP/1.0" 200 2016 "-" "InternetArchive/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"

IP does belong to Internet Archive. Anyone know why they're using Nutch?

volatilegx

11:13 pm on Nov 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's take this discussion outside the Search Engine Spider Identification forum. The ethics of archiving publicly accessible material (as interesting as I find it) are way outside this forum's topic.

wilderness

11:20 pm on Nov 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Removed per Dan's request.

Edited by wilderness.

This 32 message thread spans 2 pages: 32