homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google and duplicate content
6 months of hard earned experience
pete

10+ Year Member



 
Msg#: 6512 posted 7:26 am on Nov 1, 2002 (gmt 0)

It always amazes me when I come accross a group of developers who have been building what seems to be pretty impressive stuff but have failed in figuring out the very basics to ensure that a spider can access a sites content and reward it. When you get it right (not often ;)), it all seems so simple.

With a wry smile, I look back at the last 6 months to a year with one of our clients where we have got it hopelessly wrong.

Here's the scenario:
The client - Markets multiple products in various overlapping industries. Their product range and services scream multiple websites aimed at individual niche markets. The ideal project really!

Our solution:
We built a range of sites all aimed at specific markets. No spam here. Each site has industry and market specific info. All content is driven out of the same content management system. Via a web interface, one can easily manage where the product appears within each site from the same system. No way for a spider to pick up that the content is driven from the same system (all appears to be published statically).

We have made 2 mistakes:
1) Duplicate content within a site. On some of the sites, we displayed the same product range within multiple categories.
Example:
website1/category1/product1
website1/category2/product1

2) Duplicate content accross websites. Where appropriate, complimentary product has been listed in the product sections of respective sites.
Example:
website1/category1/product1
website2/category2/product1

This subject has been covered adnauseam. First with Altavista and their patent pending link dupe checker and more recently with Google. The million $ question has been at what point do Google and some of the other search engines consider sites content to be sufficently dissimilar to avoid penalty? This time, we were on the wrong side of right and have been penalised accordingly :(

 

fathom

WebmasterWorld Senior Member fathom us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6512 posted 8:32 am on Nov 1, 2002 (gmt 0)

A common theory around here is 10%, I would buffer that to 20 - 25%.

Diversity in online markets doesn't necessarily mean product/service diversity, and this can lead to duplicate content.

On pages that are (IMO less than 20% similar) you do have 2 courses of action to avoid the Google wraith.

Crosslinking - can to bad again if you over do it, but their are significant benefits, particularly when used to avoid duplication.

An example: news releases. As all sites are owned by the same client the corporate news is likely to be the same in all sites.

Only one site should retain the news pages and all other sites should link to those pages. Should the design layout of each individual site be the same the graphical imagery of the interface (shell) can also link to the appropriate design by using multiple cascading styles sheets on the primary web pages, serving up the appropriate style on where the request is coming from (one domain or another). If design integrity (look and feel) is not that important this makes your job easier.

Unique elements, tags, and attributes in each domain also varies content and this can actually be close to 10% depending on how innovative you are.

One example used for a client (earth science) is Plate Tectonics and Tectonic Plates.

The scientific community only uses Plate Tectonics as this define the actual science.

The general population not being as scientific savvy as scientists commonly refer to Tectonic Plates.

Quite a nice market diversity, and a beautiful way of targeting normally duplicate pages on different sites in the text content, alt tags, titles (elements and attributes) as well as file names (images, pages, and objects) and directory names.

In this case it is good to remember that "content" is not reserved for just "text".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved