Forum Moderators: Robert Charlton & goodroi
My worry is that Google will see duplicate content and penalize one of us. My website is usually updated one week *after* the newspaper's website.
Any thoughts on what will happen here?
Google doesn't care who wrote it first or who was the original author. The only determinant is who posted it first and all others get a penalty.
"You will need to add at least 12% original content to the page to prevent it from getting SR penalty (just add extra text at top or bottom of those pages)."
1) Does the content type matter? The newspaper in question prints my story, and surrounds it with its own page template, including ads, links to other sections, etc. My page is far simpler, and uses my own page template and ads.
So, to a human reader, the content is pretty much the same. The HTML for each, though, is very different. Is this enough?
2) Where do you get the 12% number? Is that referenced somewhere?
3) What if the newspaper's page (posted first) is later removed, and my site's page (posted later) is still active? Will the penalization be dropped?
"Google doesn't care who wrote it first or who was the original author. The only determinant is who posted it first and all others get a penalty."
I see why Google does that. But I'm an exception to the normally valid rule, so I'm frustrated. Seems like I can't avoid getting penalized.
Presumably, in exactly the same way they handle duplicate content that is not in the public domain -- that'd be the appropriate approach. The purpose of removing duplicate content from search results isn't to attack copyright violators, it's to return quality search results. In most cases, a search user making a particular doesn't want to find hundreds of copies of the same document. The user generally wants to see one copy of each document that's provided, whether it's in the public domain or not.
>> are sites penalized for simply referencing public domain information?
"Penalized" is one way to see it. Another way is that sites that provide content that is not uniquely useful to search users will find it difficult to get those pages indexed and ranked. Again, what value to a search engine or its users is in yet another copy of an already easily-found document?
And of course if one wants to "reference" public domain information (or any other information that already exists on another site), a link to an existing copy is another option.