Forum Moderators: open
[google.com...]
As far as I know, archive.org seems to respect the robots.txt as they look at it and leave, and I am not archived at wayback machine. (not sure of other places)
I don't disagree with their idea, it's just that in my case I prefer not to have a copy of my site there (or anywhere), and save bandwidth.
So maybe they grab sites, hold them for a while internally, before posting them publically as archive updates.
If others can find some even more recent updates... it could dispell the rumor that archive.org isn't behaving like it used to.
If that was the primary intent of archive.org? I would do so in a heart-beat.
I don't believe the third-party selling of data was ALWAYS in the realm of archive.org?
For me this puts an entirely different light on archive.org, IMO there is no difference between the mining-selling they do and all the others who use webmasters resources to generate income from third parties and not "in-return" providing webmasters with a share of the profits.
On a good point, archive.org might also be used in an emergency as a backup. At least to some extent.
I've even used it myself to gather data from websites which are no longer online.
The copyright verification is a good point as well.
Don
stevenha:
When you have to challenge an unscroupulous webmaster, regarding copyright violation (for copying your content), showing them the archive on the wayback machine, usually solves the problem quickly.
wilderness:
On a good point, archive.org might also be used in an emergency as a backup. At least to some extent.
I've even used it myself to gather data from websites which are no longer online.
Good points! I may change my mind about it archiving my site as well.
(also figured out how to do quotes in here lol)
The spider is also looking for Amazon links. They are beginning to integrate the old alexa into Amazon. Amazon affiliates are seeing the results of this now.
Google/Yahoo/Froogle watch out.
Pearl