Forum Moderators: open

Message Too Old, No Replies

Worio

don't worio, be happy

         

incrediBILL

6:04 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




198.162.51.70 [worbo2.cs.ubc.ca.]
"Mozilla/5.0 (compatible; heritrix/1.6.0 +http://www.worio.com/)"

Claims to be starting beta in Sept 2006 - blah

GaryK

8:52 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Based on my notes the "heritrix" name has been associated with the Wayback Machine. I've got notes on it going back to 2000.

os-heritrix/0.5.0 ( [crawler.archive.org)...]
mozilla/5.0 (compatible; heritrix/1.0.4 [non-exist.com)...]
mozilla/5.0 (compatible; heritrix/1.3.0 [archive.crawler.org)...]
mozilla/5.0 (compatible; heritrix/1.3.0 [crawler.archive.org)...]
mozilla/5.0 (compatible; heritrix/1.2.0 [lab.mokk.bme.hu...]
Mozilla/5.0 (compatible; heritrix/1.4.0 PROJECT_URL_HERE)
Mozilla/5.0 (compatible; heritrix/1.3.0 [l3s.de...]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; heritrix/1.3.0 [cs.washington.edu...]
Mozilla/5.0 (compatible; heritrix/1.5 [metacarta.com)...]
Mozilla/5.0 (compatible; heritrix/1.6.0 [worio.com...]

incrediBILL

9:01 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heritrix is from the wayback machine, but they released it publicly and this is the first time I've ever seen WORIO using it.

Last time I saw it was this company:

209.128.119.nnn "Mozilla/5.0 (compatible; heritrix/1.6.0 +http://innovationblog.com)

Which is related to accelovation.com as the blog is registered to a principle @ accelovation and they appear to be a data sniffer company.

GaryK

9:04 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Bill. I guess I missed the thread where that was discussed.

Pfui

8:26 pm on Jun 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Same Host info as Bill (University of British Columbia). Also, no robots.txt, and a malformed referer --

worbo2.cs.ubc.ca
Mozilla/5.0 (compatible; heritrix/1.6.0 +http://www.worio.com/)
Date Page St. Referer
06/11 11:51:52 /filename.html 403 [example1.com...]
06/11 11:52:24 /dir/filename.html 403 [example2.com...]

[Hh]eritrix has long been on my 403 hit list.

If only I had a nickel for every wonky bot-running .cs. student/class...