homepage Welcome to WebmasterWorld Guest from 54.167.138.53
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
archive.org bot/1.13.1x
successor to ia_archiver?
zCat

10+ Year Member



 
Msg#: 3472256 posted 10:41 pm on Oct 8, 2007 (gmt 0)

Just noticed a bunch of stuff like this for the first time:

208.70.24.237 - - [09/Oct/2007:00:33:19 +0200] "GET /widgets.html HTTP/1.0" 200 15915 "http://example.com/other-widgets.html" "Mozilla/5.0 (compatible; archive.org_bot/1.13.1x +http://crawler.archive.org)"

IP resolves to archive.org, so I presume it's genuine and a more informative successor to the plain "ia_archiver".

The web page at [crawler.archive.org...] actually exists, but leaves me unsure as to what UA to put in robots.txt (archive.org_bot? heretrix?), though it's getting 403s now anyway.

 

SEOPTI

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3472256 posted 9:27 pm on Oct 13, 2007 (gmt 0)

Scam as usual, I ban this with SetEnvIf in htaccess since those sites almost never respect robots.txt

[edited by: SEOPTI at 9:27 pm (utc) on Oct. 13, 2007]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved