crawler0.archive.org

Forum Moderators: open

Message Too Old, No Replies

crawler0.archive.org

Noticed it for the first time

rjohara

8:14 pm on Dec 21, 2001 (gmt 0)

A number of threads on WmW have discussed archive.org, which keeps a public database of entire websites going back for many years. Some people think it's good, some think it's bad.

I was visited this morning for the first time by an archive.org spider (crawler0.archive.org; don't have an IP address). Perhaps they've always been around and I never noticed them until I started following this topic, I don't know. It was perfectly well-behaved, picked up robots.txt first and then four or five other pages.

Submitted for the record.

wilderness

7:53 pm on Dec 22, 2001 (gmt 0)

www.archive.org

from that page:The Internet Archive, working with Alexa Internet, has created the Wayback Machine. The Wayback Machine makes it possible to surf pages stored in the Internet Archive's web archive. The Wayback Machine was unveiled on October 24th at Berkeley's Bancroft Library. Visit the Wayback Machine by entering an URL above or clicking on specific collections below.
end of quote

This explains the Berkeley vists I've been getting.

I'm not too enthused about anybody who would use Alexa to gather data. (ia_archiver)
Especially somebody whose purpose is to remain objective.

Although the site looks interesting and perhaps I can find some pages long removed from the web :-)