|How is article duplication viewed in Google News?|
Was hoping someone here could help me a little. I am looking at buying a website that is indexed with google news.
It is indexed in google news, which is crucial. However, I have noticed many of the articles on the site appear to be copies of articles that are from big famous publications.
Is this a major problem? Is there a good chance of the site losing its google news status?
As a side note, my intention upon taking over the site is to post original content. But I am just worried about this old content affecting things and losing google news indexing, which in effect will mean I purchase a lemon.
Thanks in advance.
Hi ieconomists, and welcome to WebmasterWorld. As I think you've guessed, Google news doesn't like copied content...
Google News general guidelines
|Journalistic standards. Original reporting and honest attribution are longstanding journalistic values. If your site publishes aggregated content, you will need to separate it from your original work, or restrict our access to those aggregated articles via your robots.txt file. |
Uniqueness is mentioned several times in the guidelines. In your situation, a big concern would be getting reported by the originators of the material.
I certainly would want to duplicate anything that Rupert Murdoch owned, eg. He's been particularly outspoken about news copying. Understandably, other content originators might also be upset if they see you using their material without permission.
hmm, this puts me in a very tricky situation, I really want this website. But this could be a deal breaker
Are we talking about news from press releases (distribution services, such as PRWeb, BusinessWire, etc.) on the existing site, or from other news sites?
well, as an example some of the articles are duplicate articles from Businessweek.,
I actually just checked further, and the articles are pretty much copied, but certain words throughout the article are changed and substituted for words that mean the same thing. If this changes anything, i dont know.
Also, ive checked back as far as two months and they have been doing this same thing. They have likely been duplicating these articles for over 3 months.
Can anyone give me advice here, do you think its worth the risk of buying this site considering this?
Also, just want to add that the responses and warm welcome here has been great. Will definitely be sticking around :) thanks all.
The fact that there's some customization makes it look as if a journalist has written the content from a news release. The extent of that re-writing would give me cause for concern.
The site has, obviously, done enough to pass muster with google, but that doesn't mean it will remain.
If the price is right, go for it and get working on minimizing the duplication.
I don't think Google hates the actual copied content as much as they dislike not finding anything they haven't seen elsewhere. I wouldn't focus on exact order of words anymore as Google seems to classify pages by meaning and intent more now than ever before.
Most of the times the basic facts will be repeated and quotes are the norm so some measure of copying is expected. Every site publishing the story needs to have something to add to it or it is filtered out as duplicate.
Im getting a bit worried now that its indexed at all now. Its weird, when I search some article headlines I find them in google news and others I do not.
Is it possible that google news does not index every article a source of theirs publishes?
|Is it possible that google news does not index every article a source of theirs publishes? |
Amazon doesn't get all of its pages indexed in Google, and it's quite likely that a news source doesn't get all of its articles indexed... but I can't say that with any certainty.
You might try running some tests. Identify a site comparable in authority and size to the site you're evaluating, and then run comparable searches. I'd search for headlines or text strings in articles of comparable age, with and without quotes.
Also, try a site:domain type search (ie, use the search operator), in this format...
site:example.com "quoted query string here"
Note that there should be no space between the site: operator and the domain.
It's possible for topical web content, once it's off the front page and has disappeared from current news, not to show up in a search of the entire web... unless it received some external links or the site itself is well optimized. You may not have a way of judging that.
Both the quoted search (for a unique text string) and the site: operator search help narrow that down considerably... and would be a better indication of what's in the index.
I'd search for old articles in the regular google.com, not in news.google.com.