Forum Moderators: open
There have been quite a few of my clients disppeared from the sj, www3 sites... Although the update has not yet started (finished) yet, I strongly believe their site will not be included in this update. So, I digged in to find what's causing it. Over hours of exploration, I have found one similarity among these troubled clients' sites... They seems all have "copied" some contents from else where... However, some of them have the permission of the original author of the articles (mostly new scientific research articles)... Over the past months, I have mentioned to their web editors to make sure not directly using the contents from other web sites, and they have modified some of the content and layout quite nicely. And they have had no problem with Google. Until this mysterious update... IT SEEMS THEY ARE CAUGHT BY A NEW FILTER! So, my guess is google has spent quite some time to update their algo, and this new algo contain a much powerful content duplicate detector...
For those of you still can't find your sites on www3 and sj, maybe you want to make sure you do not have duplicated contents even they are copy right "legit" and HAD no problem before...
Please do not flame me, this is only my theory, it may or may not be true.
Quest Restate:
"if a site is banned for duplicating contents, and just give up the domain, start a new one fresh, can he/she still use the clean (non duplicated) contents from the old site? Doe he/she need to shut down (or delete these clean content from) the old domain? "
But it's good to see many of you support my theory. :) I sure google's new filter can help the searchers more (this is what they care anyways), but the spammy results on sj just don't cut it... I hope GG is aware of this.
I was planning to concentrate more on content then on optimization "trends", but I guess I'll have to keep investing precious time in reading and following google's "rules".
what about duplication within a site e.g template pages with small variations between them? would this be considered duplication?
I would say yes. I have a domain that had 23K+ pages indexed, and this update it looks like it's down to a mammoth 71 pages. All of the pages share snippets in the body text and are designed with the same template. I also went a bit overboard with some of the internal navigation (ie identical lists of links on many pages).
"National Anthem God Save The Queen"
1,280 pages show that contain copies of the british national anthem. That's a lot of un penalized duplication
For example, say you have a page with a portion of copied text and a portion of unique content. Now if someone searched for a keyphrase out of the copied part there's a fair chance your results may not be shown.
But what if someone did a search for keyphrase out of the unique part, would your page still be penalised for the copied part? Do these Dupe filters work per page? per site? per search? Does this make any sense?
It is a site with a thematic book list ...
for example there are:
A) duplicated structural information
- on each category page there is the same short paragraph duplicated from the homepage, which gives visitors a short information about the site motivation...
- duplications of navigation elements
- a common footer on each page
B) different views of information
- serveral books are sorted into more than one category
==> bibliographical information of a book are duplicated serveral times (at least twice, because there is also a page for each book)
- book reviews are also existing twice: first time at the book page of the book which is reviewed. and second time at a page with a compostion of all reviews from a reviewer...
I checked the google index: there is no banned page in my site, but serveral pages are omitted for the SERPs - which is not a problem for me, because at least one of all pages is listed for relevant keyphrases.
It could become a problem, if one other person would copy my site contents
Regarding the new indexes (indecies?), we see in -sj and -fi that some well performing sites containing dup content are still doing fine...but the thing is, I wouldn't call their dup's spam....
This is tricky, because there can be very legitimate reasons for sites to dup content. "Reprinting with permission" is an old, well-established publishing convention, and I'm not sure that G could differentiate between dup content that was legit versus spammy dups...at least in their algo's.
What I do believe is that G may be applying some sort of dampening filter on pages with duped content (older than the tagged original content), but that's nothing new.
The list of examples i gave were all perfectly legitimate reasons for content to appear on different websites. One of my clients is a very large movie preview site. We get all our reviews from a content management service. Should we be penalized for that?
What if a doctor writes an essay, has it on his site, and give me permission to post it on my medical website? Should I be penalized?
That is why I see problems with either a policy like the one you describe, IF they really are doing it. (lets all remember noone can confirm it, its just conjecture).
My point was this:
If you have a movie review site (assuming the movie review sites niche is very spammy), and your reviews are similar to your competitors, I would suggest that Google would remove one of your sites from their database (i.e. gray bar).
It would be better for searchers not to find 2 sites in the top 10 that were very similar, thus it would be better for Google to do so.
That is my theory about the new Google stict algo.