Forum Moderators: Robert Charlton & goodroi
Fact: Scraper sites rarely scrape from just one source, usually the scrape content from 3 or 4 sources, sometimes more.
To detect them, simply look for sites that seem to have content which exactly duplicates the content on several different sites, and contain no content which isn't a duplicate.
My fear is that Google is hoping the +1 button will help them solve this, thinking that if a site is a scraper site, people won't +1 it and they'll +1 the original site. If so, we are doomed because that is 100% false. The average internet user not only can't tell the difference they don't care about the difference even if they could tell. They will +1 the crap out of the first site they see.
...they USED TO BE BETTER AT THIS.
I'm not convinced that the template is a major algorithm factor...
For my part, I really don't understand why it's so hard for Google to detect scraper sites.
Fact: Scraper sites rarely scrape from just one source, usually the scrape content from 3 or 4 sources, sometimes more.
To detect them, simply look for sites that seem to have content which exactly duplicates the content on several different sites, and contain no content which isn't a duplicate.
[edited by: TheMadScientist at 2:23 am (utc) on Apr 8, 2011]
For my part, I really don't understand why it's so hard for Google to detect scraper sites.
...(and with randomised elements)...
Facts are not copyrighted
Maybe the whole issue is almost moot now because in a few days I could probably write some software that could rewrite some text so differently (and with randomised elements) tat one could not see it as scraped or detect that it has been computer generated.
This thread is so painful. Not too long ago (January 28) we were discussing Google's Scraper Update [webmasterworld.com] which Matt Cutts described like this: "The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site's content."
And ... Cutts and Panda stated that they are happy with the results of the Panda update. Did they check the results properly before making such a stupid statement? Something is broken and they didn't even notice it.
falsepositive wrote:
So Google may be sending the signal that we fix our site's quality or suffer the humiliation of being outranked by scrapers...?
bramley wrote:
Maybe the future lies with scrapers - not the low-life rip-off sites that dominate now - but intelligent sites that can give you just what you need in the style you like and all generated on the fly.
chrisv1963 wrote:
And ... Cutts and Panda stated that they are happy with the results of the Panda update. Did they check the results properly before making such a stupid statement? Something is broken and they didn't even notice it.
Brett, the problem is content copied one year later than the original was indexed are ranking above the original. This is the biggest failure of panda and google ignores it in the name of quality.
DMCA is nice if you are talking one site - often we are talking dozens and you'd have to be a full time lawyer to send out all those notices.
Brett_Tabke wrote:
Solution: Only allow the original content to be crawled by Google for 48hrs. eg: cloak it - then release it to the general public after it shows in Google index.
[edited by: rlange at 6:13 pm (utc) on Apr 8, 2011]