Forum Moderators: open
There have been quite a few of my clients disppeared from the sj, www3 sites... Although the update has not yet started (finished) yet, I strongly believe their site will not be included in this update. So, I digged in to find what's causing it. Over hours of exploration, I have found one similarity among these troubled clients' sites... They seems all have "copied" some contents from else where... However, some of them have the permission of the original author of the articles (mostly new scientific research articles)... Over the past months, I have mentioned to their web editors to make sure not directly using the contents from other web sites, and they have modified some of the content and layout quite nicely. And they have had no problem with Google. Until this mysterious update... IT SEEMS THEY ARE CAUGHT BY A NEW FILTER! So, my guess is google has spent quite some time to update their algo, and this new algo contain a much powerful content duplicate detector...
For those of you still can't find your sites on www3 and sj, maybe you want to make sure you do not have duplicated contents even they are copy right "legit" and HAD no problem before...
Please do not flame me, this is only my theory, it may or may not be true.
If you search on this board, you may find hundreds of discussions over quantity of dupplication that may get you banned, and no one can provide an exact ratio. Due to the "banning the site, instead of pages" SPAM policy, the price is just to high to test things like this. GoogleGuy obviously will not reveal this inside secret to the outsides either.
However, according to my experience, Google has been quite generous regard to this issue UNTIL this update... Seems they are tighting it up. For example, before, if you have 10 pages out total of 100 pages on your sites that have duplicated contents from elsewhere, it's 10%, you are safe. However, you may get kicked by this algo (the new algo), I still can't tell you how much they have tighted up the policy, but I am almost sure they have done something to accomplish a more restricted content duplication policy. :)
Maybe we can wait till this update finishes, and then do a poll to check if this is true... So far I have heard an increased amount of webmasters mentioned their sites have "vanished" from the www3... I think this may be one of the causes...
Isn't there one fundamental flaw with your theory? How could Google possibley tell which is the site with the original content on (which other sites have copied)?
If Google penalised the sites that copied then surely there would be a very good chance the original site would get penalised too?
If this was the case then it would be oh-so easy to reduce you competition - simply set up a website and copy all of their content, sure this site will get penalised, but so will all your competition. If you theory is true then how would Google get around this?
Does this make sense?
Chris
Thanks for joing this topic. Well, one thing for sure is google and many other SEs must have "dup. content fitlers" installed in order to show the audience clean results. Otherwise, everyone can just copy/duplicate the #1 postioned site for the particular keyword. Now, how does google deferentiate which is original or not, they obviously has their own way, but a simple way to do is to see which page has a higher PR or which site has a higher rank ( they may keep a database for credits of sites on the SE, like the credit center of our bank accounts etc... )
:) SoHu
If they are penalising it seems to be somewhat random, unless they haven't fully kicked in. For example I pasted some random content from amazon in to google search.
[google.com...]
I could give you endless amounts of duplicated content that google is serving up from many different sites.
So where does that leave us? (I'm scratching my head! ;)
Thanks for the url, it's a very good example. My guess is goole's dup. fitler must have some detailed rules, such as all contents on amazon can be duplicated? :D Book names, song names, people names, descriptions etc... all can be duplicated? Hehe, any experienced SEO on this board would like to share their oppions?
Once Again, let's pick GOOGLE's brain... :p
Think your theory is right - already seen it happen prior to this update. Google seems to recognize 'snippets' and removes all urls from it's index that contain that copied content. Only the original source (URL) remains in the index. I'm afraid none of those urls will return (until now at least). My urls where gone about 4 weeks ago and haven't returned yet nor in www2 or -sj.
The pages are identical depending on the time each is called from my server but they are both in the Google database. It actually has driven me quite mad.
So please tell me how this isn't caught by the dup content filter?
taxpod: I think this has something to do with dynamic contents... I once duplicated an entire site for testing purposes using PHP, and it was copying content from all over the places, it was for testing purposes, so, I was doing all kinds of illeagal stuff on it... Believe or not, since it was PHP, google acturally did not mind, not even those hidden links. :p
(I am not suggesting people go ahead and do these things, and I am sure Google has improved since then, so, do not test your site like this. :))
Now, how does google deferentiate which is original or not, they obviously has their own way, but a simple way to do is to see which page has a higher PR or which site has a higher rank
I don't think so, brother. They had better do it by a dated/cache system or they will have loads of complaints and a maybe a few lawsuits.
For example, I've got Copyrighted articles and marketing copy floating around that is included on our main site as well as on higher PR ones that picked it up later.
So, you think they have some right to placement of MY article from MY site just because their PR is higher? Again, I don't think so. Legally, I know so as Copyright law is quite specific. Google had better think about the ramifications, too...
It was just a guess of mine, obviously google should have some more sophysicated algos to do this kinda of tasks, like I said in other postings, they may have date, time and geo. location in mind (cache doesn not work, cause when the content of the page changes, cache get updated...), and they may also have some kinda credit ranking going (something we may not know). etc... This is only a guess. :)
That said, you can also always file a DMCA complaint through Google if your content shows up on someone else's site in Google's search results. [google.com...]
Since the old domain is not spiderable (or not interested by goolge) anymore, it would be safe to duplicate some "clean" content out from the messed up one, right?
Any people? admins? GG?
To answer your question: Dup content will not cause a site to be banned. However, the site will be removed from the index. If the content changes it will reappear. I know this to be the case because I accidently placed one of my sites index pages in the wrong folder on my server. Effectively I dup the two sites. As a result Google dropped one of the pages (the one with the lower pr). I noticed my mistake, corrected it, and the next month it was back in the index.
also I have tested the results of a keyphrase, where I got very spammy results in the past.
this tests have shown: google omittes results with duplicate snippets. (seems to be a little bit strikter than in the past.)
this means: the pages are in the index. to be omitted is not a permantent penalty and could be an large effort for the google SERPs
but, if there is only a minimum of different text, for example an other order of keywords or one additional word: then the pages are in the SERPs - very hard SE spam, created with doorway pages :-(
Quest Restate:
"if a site is banned for duplicating contents, and just give up the domain, start a new one fresh, can he/she still use the clean (non duplicated) contents from the old site? Doe he/she need to shut down (or delete these clean content from) the old domain? "
But it's good to see many of you support my theory. :) I sure google's new filter can help the searchers more (this is what they care anyways), but the spammy results on sj just don't cut it... I hope GG is aware of this.