How does G treat Archive.org - old sites and past copyright infringements? - Google Search and SEO forum at WebmasterWorld

It occurs to me that the SEs must treat archive.org with very special care.

Consider three cases:

1) You began on the web in '98 with sites at several of the free website providers. Archive.org still has numerous complete copies of each of these near identical sites, long after you shut them down completely, and removed all content. Say ten sites with the same content.

You moved to your own proper domain in 2000, and posted much of the text of those early pages. (Hopefully now with far less embarrassing html.)

Does G count the "duplicate copies" still in archive.org against your current site? Since you put them on the web earlier, they may be viewed as the more legitimate domain, and your newer domain as a copy.

2) A different case: There's never been an older domain with your content, just the archive.org copies of it down the years, does G count the archive.org copies against your domain. Sounds a silly question, but I'm not convinced G et. al. are incapable of making such a blunder.

3) You had text stolen by infringers, you noticed after a few months, and DMCA'd their hosts, who removed the infringing text. Archive.org still has numerous copies of the infringer's sites with your text on. Does G ignore them, or accumulate them to eventually trip a duplicate content filter?

Should we go back through our huge list of takedown notices, dig out every single copy still in archive.org, and serve archive.org with a DMCA for each one?

Any definite word from G on this?

How does G treat Archive.org - old sites and past copyright infringements?

Angonasec

tedster

Angonasec

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week