Is there evidence of a new G duplicate page algo? - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Is there evidence of a new G duplicate page algo?

I'm not trying to rehash this, just wondering what people have seen

sublime1

6:52 pm on Jun 15, 2004 (gmt 0)

10+ Year Member

I have read a lot of stuff on duplicate content detection recently. Specifically, several people have speculated that Google's algorithms have been recently beefed up to identify duplicates, presumably hitting in the May update.

If so, I think it might be useful to recap where we think the current state of things is. Consider these questions:

Is there real evidence that the dupe identification method has changed?
Do exact, or near exact pages on the same site get identified (e.g. www.mydomain.com and mydomain.com, which return the same page)?
Do slightly different pages on the same site get identified (e.g. same page, but some reference to host name embedded in content, or, page the same except has a dynamic date on it, so a little different tomorrow)
Identical pages hosted on different domains? (e.g. www.mydomain.com and www.mydomain.co.uk)
Two search results pages on the same site, same results, just in a different order (e.g. sort by price, sort by popularity, score, etc.)
Large blocks of text the same as on other sites. (e.g. CNN and FoxNews both getting API news feeds), but with different frames, etc.)
Same as above but with non-volatile content (e.g. descriptions of products)
Sites on two different IPs not in the same class-C addresses, different whois records and owners, different domains, but otherwise similar or even identical content

Then, what happens when duplicates are identified? If the Google algo did change in the May update, then no one really can possibly know this yet, but for completeness...

First one found wins, others are still in the index but score so poorly as to be meaningless
Penalty applied, perhaps to all instances of duplicates?
PR0 for site
Longest-standing WHOIS record wins :)

Again, this has all been hashed around a lot for several years here. Lots of good general stuff on how this might be done and what outcomes there may be. But... my hope is that someone has seen evidence that there has been a change (lack of change isn't really evidence, I guess).

Thanks in advance!

McMohan

7:53 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

sublime1, Good post.

I don't think I have the resources to give a definitive answer to the various issues raised by you. But a symptomatic disection of one of similar cases which happened as recent as a week before, may throw some light on it.

Site - www.example.com (PR 6). Ranked within top 5 for 3 competitive phrases till last week.

Has duplicate sites with same content in example.com (PR 3), www.example1.com (PR 3) and www.example2.com (PR 1)

www.example.com has 330 pages indexed with Google, example.com has 340 pages, www.example1.com has 113 pages and www.example2.com has 12 pages indexed with Google.

Same whois details for all the sites.

www.example.com is the oldest site, older by 3 years than the other 2 dupes.

Now www.example.com is ranked nowhere in top 1000 for those search phrases.

A search for "example" showed example.com for a while and now showing www.example1.com, where "example1" is an entirely different word than "example".

Conclusion:

Going by my past experience, Google used to merge the duplicate sites with the site that has the maximum PR. But now I am left wondering, why has Google merged the highest PR and older site into a lower PR and younger site? None of the duplicate sites are cross linked and links are naturally obtained without any link exchange.

Hope understanding this case will help all of us in getting some insight into the duplicate content issue.

Best Regards

Mc

PCInk

8:27 am on Jun 16, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

> Identical pages hosted on different domains? (e.g. www.mydomain.com and www.mydomain.co.uk)

www.x.xom and www.x.co.uk are merged by Google if they are point to the same hosting IP address and the files are the same. There is no penalty for this, but only one gets listed, usually the regional variation if there is one.

sublime1

1:28 pm on Jun 16, 2004 (gmt 0)

10+ Year Member

McMohan -- thanks for this info.

So your whole www.example.com site was "nuked" (from site: and SERPs?) replaced with other domains with duplicate content? Do you still get in the SERPs, just on a different domain?

If so, this could suggest that:

Possibly a new mechanism is being used to determine that sites are the same (e.g. same IP address and/or WHOIS info)
Possibly a new algorithm is being used to determine which site wins

Thanks again.

allanp73

6:02 pm on Jun 16, 2004 (gmt 0)

10+ Year Member

I have several sites that purposely use duplicate content. The reason for the practice is each site needs general information required by the users. This could be an user guide to widgets. However, what I have done is add to this general dup content specific information about the particular widget, which is the theme of the site.
This is done for legitimate reasons and it does not make sense to devert these visitors to another site with general information, because the specific information is essential.
So far no problems with Google, but it is hard to say because the rankings in my industry are dominated by irrelevant directories. However, will this become a problem later?
Is Google going too far with this type of penalty which targets the few and penalizes the many?

sublime1

5:25 am on Jun 17, 2004 (gmt 0)

10+ Year Member

allanp73 -- So far there's very little evidence to indicate that Google is actually doing anything tremendously different than in the past (with duplicates), or at least I haven't seen an posts that seem "conclusive" on any major new changes.

Indeed, your post suggest that nothing bad has happened (yet).

martinibuster

5:34 am on Jun 17, 2004 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Duplicate content penalty/filter has been around for a long time. It could be that more people are coming online and experimenting and getting burned by it. Certainly there've been many complaints about having one's site copied.

The other one that's going around is heavily interlinked websites. Different ip's and servers won't save you from that one, regardless of what your e-book tells 'ya.

McMohan

5:35 am on Jun 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

sublime1,

www.example.com still shows PR 6, all the backlinks and pages. But doesn't rank for its own name "example".

Its dupe, www.example1.com ranks only for unique keywords such as "example", "Company Name", but ranks nowhere within 1000 for the main keyphrases which www.example.com used to rank for. This may be partly because www.example1.com is a PR 3 site and IMHO that ain't good enough to rank for such competitive phrases. Perhaps it is another way of penalizing Dupe sites, by merging the main site into lower PR site, quite the other way round.

More such recent examples will help in formulating a pattern.

Best Wishes

Mc

McMohan

5:48 am on Jun 17, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Was there wild changes in SERPs just a while back? I see quite different SERPs across 3 sites.

Mc

itsakoo

8:44 am on Jun 18, 2004 (gmt 0)

10+ Year Member

Regarding duplicate content on the same site, we've just added some new pages to one of our sites that contain very similar content to existing pages. The new pages are ranking well for their targeted phrases where as the older page have been buried for theirs. In conclusion, it would appear that in this situation, Google will penalise the older pages.

iTSAKOO

Shannon

9:35 am on Jun 18, 2004 (gmt 0)

10+ Year Member

I have no examples but from what I've read here it seems Google is dropping older sites in favor of newer ones... always providing the searcher with the latest relevant content?