Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: open
In that spirit, he also has asked Google to furnish him with a copy of its database, say with a six-month delay so Google's competitiveness doesn't suffer.
Google has yet to grant his request. But Kahle hopes the company will come around, especially in light of its claim that it wants to have a positive impact on the world. A Google spokeswoman declined to comment.
Correct me if I'm wrong, but isn't the web already an open archive?
Its open, but its not archive - it is unthinkable for archive to only contain the latest version of the book (not all revisions no matter how minor), it is also unthinkable for archive to delete stuff when it "gone out of print".
Web is just a current snapshot, where as an archive (good one!) is a full featured movie.
One problem with archive.org is if the bot is *ever* blocked by robots.txt, ALL previously stored pages are deleted.
Not deleted, but made "unavailable." A real-time check is made to see if robots.txt exists before perviously stored pages are shown. That doesn't mean they are deleted -- far from it. They're merely waiting until such time that the information can be useful once again.
I had a domain blocked and then I sold the domain name and moved everything to a new domain. The new owners don't have a robots.txt. Suddenly all the stuff I thought I had been blocking came alive and available on archive.org -- six years worth. I have no control over this because I no longer own the domain.
Archive.org is a slimy operation in my opinion. No respect for the rights of webmasters.
Not deleted, but made "unavailable."
Some are made "unavailable", some are lost however - 404s about for all sort of reasons. Whatever the reason it aint an archive if some items are "unavailable" - excusable for old fashined archives where old rare books could be subject to restoration procedures, but not acceptable for digital archive that can generate copies at near zero cost.
Suddenly all the stuff I thought I had been blocking came alive and available on archive.org -- six years worth
Sorry to hear that but it is naive to "protect" content using robots.txt. If it was meant to be for registered (paying) users then you should have password protect it.
The way I current situation is that there is a balanced compromise between free pages that can make into search engine or being paid but not making into it.
You are going to have to read up on the DMCA, and how to do a proper removal request. If you created that content, and didn't sell the rights to that content when you sold the domain, then you have the copyright on it.
Class action, anyone?
Actually, I think the ROBOTS, NOARCHIVE meta is working. I use that on everything now. Of course, before Yahoo kicked in this year, it was always GOOGLEBOT, NOARCHIVE - which did not work at archive.org.
I guess technically, archive.org isn't "indexing."
Highlighted in red:
"Note: Currently only few robots support this tag!"
No mention of the meta tag there in the official RFC.
It doesn't look like the meta tags are officially recognized. Only robots.txt. And, as you point out archiving and indexing are not necessarily the same. There may be people who actually want archive.org to archive their site, but it not appear in search engines. Would make sense for a site that expected almost all traffic would be from links on other sites, and not through SEs.
Time to dash off a fax to millionaire whiz-kid Brewster Kahle and ask him to take out all six years worth. Their own evidence gives me a prima facie case, which I wouldn't be able to prove otherwise!