homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
memorybot
from: archivethe.net
Pfui




msg:4681586
 6:04 pm on Jun 20, 2014 (gmt 0)

37.16.72.207
Mozilla/5.0 (compatible; memorybot/1.20.71 +http://archivethe.net/en/index.php/about/internet_memory1 on behalf of DNB)

robots.txt? Yes

Internet Memory Research
Parent range: 37.16.72.0 - 37.16.79.255
CIDR: 37.16.72.0/21

Note: The bot's versioning is wacky. In the past week, Project Honey Pot participant sites report FIVE different version numbers for the IP that hit me [projecthoneypot.org ]:

/1.20.30
/1.20.33
/1.20.37
/1.20.41
/1.20.70

Notes:

- Neighboring IPs (.208; .209) show the same versions, and more. E.g.: [projecthoneypot.org ]

- The umbrella, internetmemory.org, appears to be more European than not.

- Yet Another All-Web Archive. But apparently not connected to Amazon's archive.org -- yet.

 

aristotle




msg:4681676
 1:39 pm on Jun 21, 2014 (gmt 0)

Can you please describe this "Amazon's archive.org" that you mentioned. It's probably discussed in another thread that I missed, so I don't know about it.

lucy24




msg:4681706
 6:59 pm on Jun 21, 2014 (gmt 0)

TIA doesn't actually belong to Amazon, does it? They just crawl from aws ranges.

Pfui




msg:4681743
 1:08 am on Jun 22, 2014 (gmt 0)

I don't know about archive.org's crawling bases, AWS or otherwise. That said, a major "Institutional Supporter" of Archive.org is Alexa -- an Amazon company. https://archive.org/about/credits.php

And --

"Alexa's operation includes archiving of webpages as they are crawled. This database served as the basis for the creation of the Internet Archive accessible through the Wayback Machine.[7] In 1998, the company donated a copy of the archive, two terabytes in size, to the Library of Congress.[5] Alexa continues to supply the Internet Archive with Web crawls." [en.wikipedia.org...]

Thus I reckon Archive.org's data, tracking and stats get accessed by Amazon in some way, shape, or form, thus my POV: "Amazon's archive.org".

Sidling back to memorybot -- archivethe.net appears to have different connections.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved