Welcome to WebmasterWorld Guest from 54.196.26.1

Forum Moderators: open

Internet Archive Has 44 petabytes Worth of Data

     
5:36 pm on Oct 8, 2018 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:25618
votes: 776


We recently discussed how the Internet Archive helped Wikipedia with a few million broken links. [webmasterworld.com]

Did you know that the Internet Archive has 44 petabytes worth of data, and adds four petabytes each year.

You can hear the presentation by Mark Graham of the Wayback Machine here on Soundcloud .

[soundcloud.com...]
8:28 pm on Oct 8, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


Did you know that the Internet Archive has 44 petabytes worth of your websites data, and adds four petabytes each year.

FTFY engine :)
8:53 pm on Oct 8, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 13, 2018
posts:185
votes: 39


i don't know if I can share the link, but once, I read what happened to a web master, who closed one of his site, then after a while, he released the domain name (something you should never do), a few weeks after the expiration, he found out that someone had acquired the domain name (this is not surprising), and, downloaded the original site from the Internet Archive and put it back online, without even modifying the contact information, etc... I bet this happens often, and scrappers certainly love the Internet Archive ...
9:34 pm on Oct 8, 2018 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:25618
votes: 776


Webmasters can block the archive if they choose.
9:42 pm on Oct 8, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 13, 2018
posts:185
votes: 39


Webmasters can block the archive if they choose.

This is not that easy. From my experience, their crawler is respecting neither the robots.txt directives nor the noindex tag. Once, I had to write them, and expose issues with the fact they were archiving my sites, whereas they shouldn't have and they more or less explained that they didn't know what the issue could be (note that they answerd, which is still a good point).

ps: personally, until recently, I thought that the noindex tag, was fine for all legitimate crawlers (including the Internet Archive, but I learned that it is not...)
10:50 pm on Oct 8, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15199
votes: 682


someone had acquired the domain name . . . and downloaded the original site from the Internet Archive and put it back online
I am filled with admiration.

their crawler is respecting neither the robots.txt directives nor the noindex tag
“Block” doesn’t mean put up a sign that says No Admittance, or Employees Only. It means deadbolt the door.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members