Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Idea for new algorithm to prevent scraper sites from outranking you


JeffOstroff - 4:27 pm on May 26, 2006 (gmt 0)


adamovic,

I think you got it backwards, your assessment of my advice is awful.

When I look at the 65,000 to 85,000 scraper sites that have targeted our site, they are mostly created this year, some last year and in 2004 as well, many in the last month.

Many of these scraper sites use automated tools to generate their sites quickly using new domain names. The algorithm I propose would work on the vast majority of scraper sites.

Could a few scrapers sneak by the algorithm via obtaining older domain names? Absolutely! But we are talking a few, not all of them.

As it turns out, we created our sites in 1998, so the scrapers have their work cut out trying to find domain names older than that to camp out on. We have good links pointing to us, and we have shut down over 40 scraper sites this month alone. That will all help. Whenever we find a Page Not Found scraper site, we submit to Google’s automated URL removal tool, 3 days later they are gone.

But to dismiss my idea as awful shows a clear lack of thinking this through. The idea is to remove as many spam results as we can, and certainly this would remove the vast majority of scraper sites.

No single solution can remove 100% of the fraud.

It's getting so bad, that almost any search I do on Google these days hardly ever yields what I am looking for, all I get is scraper sites that look like SERP pages, without the content that Google shows should be there.

Anyway, I stand by my ideas, you can't just pick a few exceptions and claim it won't work.

I like the idea that crobb305 presented above where they use the cache as well to help filter our scraper sites. That would catch the scrapers who buy older domain names.

Google should buy Archive.org, so they could also bounce the searches off Archive.org to see who has the original content (me), and who has the duplicate content (scammer from Korea). I often use screen shots of Archive.org in my DMCA Cease & Desist letters to web hosts to shut down sites who steal content from us.


Thread source:: http://www.webmasterworld.com/google/34505.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com