who's crawler is this?

It looks like the IBM crawler, but it didn't request robots.txt and the IP address resolves to research.archive.org.
And it changed it's name!

209.237.233.203 - - [08/Apr/2004:06:43:08] "GET / HTTP/1.0" 200 4278 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:43:09] "GET /dir/page1.html HTTP/1.0" 200 8681 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:18] "GET /page2.html HTTP/1.0" 200 3801 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:18] "GET /page3.html HTTP/1.0" 200 2846 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:19] "GET / HTTP/1.0" 200 4278 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:20] "GET /dir/ HTTP/1.0" 200 4422 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:20] "GET /dir/page1.html HTTP/1.0" 200 8681 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:06:50:22] "GET /page4.html HTTP/1.0" 200 15061 "-" "http://almaden.ibm.com/cs/crawler/focus" 
209.237.233.203 - - [08/Apr/2004:09:02:22] "GET / HTTP/1.0" 200 4278 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:02:23] "GET /dir/page1.html HTTP/1.0" 200 8681 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:16] "GET /page2.html HTTP/1.0" 200 3801 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:16] "GET /page3.html HTTP/1.0" 200 2846 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:17] "GET / HTTP/1.0" 200 4278 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:17] "GET /dir/ HTTP/1.0" 200 4422 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:18] "GET /dir/page1.html HTTP/1.0" 200 8681 "-" "me" 
209.237.233.203 - - [08/Apr/2004:09:08:19] "GET /page4.html HTTP/1.0" 200 15061 "-" "me"

209.237.235.158 - - [09/Apr/2004:07:57:16 +1000] "GET /robots.txt HTTP/1.0" 200 398 "-" "os-heritrix/0.6.0 (+http://crawler.archive.org)"
209.237.235.158 - - [09/Apr/2004:07:57:19 +1000] "GET /downloads/file1.doc HTTP/1.0" 200 24576 "-" "os-heritrix/0.6.0 (+http://crawler.archive.org)"
209.237.235.158 - - [09/Apr/2004:08:20:29 +1000] "GET /sitecheck.internetseer.com HTTP/1.0" 404 1229 "-" "os-heritrix/0.6.0 (+http://crawler.archive.org)"

who's crawler is this?

dcrombie

Staffa

dcrombie

gojomo

jdMorgan

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week