homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
archive.org bot/1.13.1x
successor to ia_archiver?
zCat




msg:3472258
 10:41 pm on Oct 8, 2007 (gmt 0)

Just noticed a bunch of stuff like this for the first time:

208.70.24.237 - - [09/Oct/2007:00:33:19 +0200] "GET /widgets.html HTTP/1.0" 200 15915 "http://example.com/other-widgets.html" "Mozilla/5.0 (compatible; archive.org_bot/1.13.1x +http://crawler.archive.org)"

IP resolves to archive.org, so I presume it's genuine and a more informative successor to the plain "ia_archiver".

The web page at [crawler.archive.org...] actually exists, but leaves me unsure as to what UA to put in robots.txt (archive.org_bot? heretrix?), though it's getting 403s now anyway.

 

SEOPTI




msg:3476911
 9:27 pm on Oct 13, 2007 (gmt 0)

Scam as usual, I ban this with SetEnvIf in htaccess since those sites almost never respect robots.txt

[edited by: SEOPTI at 9:27 pm (utc) on Oct. 13, 2007]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved