Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

Google and duplicate content

6 months of hard earned experience

7:26 am on Nov 1, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 21, 1999
votes: 0

It always amazes me when I come accross a group of developers who have been building what seems to be pretty impressive stuff but have failed in figuring out the very basics to ensure that a spider can access a sites content and reward it. When you get it right (not often ;)), it all seems so simple.

With a wry smile, I look back at the last 6 months to a year with one of our clients where we have got it hopelessly wrong.

Here's the scenario:
The client - Markets multiple products in various overlapping industries. Their product range and services scream multiple websites aimed at individual niche markets. The ideal project really!

Our solution:
We built a range of sites all aimed at specific markets. No spam here. Each site has industry and market specific info. All content is driven out of the same content management system. Via a web interface, one can easily manage where the product appears within each site from the same system. No way for a spider to pick up that the content is driven from the same system (all appears to be published statically).

We have made 2 mistakes:
1) Duplicate content within a site. On some of the sites, we displayed the same product range within multiple categories.

2) Duplicate content accross websites. Where appropriate, complimentary product has been listed in the product sections of respective sites.

This subject has been covered adnauseam. First with Altavista and their patent pending link dupe checker and more recently with Google. The million $ question has been at what point do Google and some of the other search engines consider sites content to be sufficently dissimilar to avoid penalty? This time, we were on the wrong side of right and have been penalised accordingly :(

8:32 am on Nov 1, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fathom is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 5, 2002
votes: 109

A common theory around here is 10%, I would buffer that to 20 - 25%.

Diversity in online markets doesn't necessarily mean product/service diversity, and this can lead to duplicate content.

On pages that are (IMO less than 20% similar) you do have 2 courses of action to avoid the Google wraith.

Crosslinking - can to bad again if you over do it, but their are significant benefits, particularly when used to avoid duplication.

An example: news releases. As all sites are owned by the same client the corporate news is likely to be the same in all sites.

Only one site should retain the news pages and all other sites should link to those pages. Should the design layout of each individual site be the same the graphical imagery of the interface (shell) can also link to the appropriate design by using multiple cascading styles sheets on the primary web pages, serving up the appropriate style on where the request is coming from (one domain or another). If design integrity (look and feel) is not that important this makes your job easier.

Unique elements, tags, and attributes in each domain also varies content and this can actually be close to 10% depending on how innovative you are.

One example used for a client (earth science) is Plate Tectonics and Tectonic Plates.

The scientific community only uses Plate Tectonics as this define the actual science.

The general population not being as scientific savvy as scientists commonly refer to Tectonic Plates.

Quite a nice market diversity, and a beautiful way of targeting normally duplicate pages on different sites in the text content, alt tags, titles (elements and attributes) as well as file names (images, pages, and objects) and directory names.

In this case it is good to remember that "content" is not reserved for just "text".