How to Identify Potential Duplicate Content Problems

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to Identify Potential Duplicate Content Problems

Asia_Expat

8:32 am on Jul 27, 2008 (gmt 0)

Over the last few months, I've seen the number of URL's returned by the site: (without the /*? hack) operator increase from around 7000 to 31,000. This doesn't figure with the amount of URL's my site actually has and I'm worried a dupe issue might have slipped past my radar... but I can't figure a way to spot it, being as you can only search the first 1000 pages. Do you have any hints and tips to spot dupe issues that may be there?

Receptional Andy

9:18 am on Jul 27, 2008 (gmt 0)

The two approaches I use commonly as basic tests for on-site duplicate content are to spider the site myself (Xenu Link Sleuth is free and does a great job at this for smaller sites) and also to refine site searches.

The two most common site: search refinements I use for this are to add the inurl operator, and to take advantage of the fact that the site operator allows for searching subfolders and dynamic pages as a 'site', a few examples below:

site:example.com inurl:category
site:example.com -inurl:category
site:example.com/category
site:example.com/dynamic.htm inurl:search

Another good approach to finding dupe content is to search for unique strings of text from the page, within double-quotes. This also works well for off-site duplicates.

I often use other search engines than Google for finding dupes, since Google does a lot more filtering of results than some of its competitors.

You may find some of the threads linked from the Google Hot Topics [webmasterworld.com] section helpful, for instance the ones on the site operator [webmasterworld.com] and the duplicate content overview [webmasterworld.com].

[edited by: Receptional_Andy at 9:21 am (utc) on July 27, 2008]