Repeated Content and Page Rank Dillusion

My company has an ecommerce site with ~600 products and ~50 categories which generates approximatly 18,000 unique pages in google (actually this number is overkill - I believe that ~2/3's of this number is supplemental results due to mangled code fixed ~3month back).

Obviously - there is a lot of duplicated content in the sheer number of pages for instance:

An individual category is most prominently referenced using X_Y_Z where X and Y are integers corresponding to the higher level parent categories and Z corresponds to the category itself. On this "main category page " are links to sort by various criteria, and dependant upon the categories size, links to pages 2,3,... etc. Formerly sometimes the category was referenced using 0_X_Y_Z, Y_Z, or just as Z, as well as X_Y_Z, but I believe this error has been fixed. For a single deep category with only one page worth of items (a category with minimal duplication) some 31 pages are indexed (roughly ten sorting variations and the rest is supplemental from the older variations on X, Y, and Z). This problem is compounded when categories contain more then a page worth of items or when they are not at the deepest level.

Individual products are referenced by a product id - but also dependant upon the referring page are sometimes accompanied by a category or manufacturer id. In this manner a minimally indexed product corresponds to atleast ten pages (the product which I cheked this with is quite deep in the site and I suspect this number is higher, this was roughly five pages and five supplementals). Additionally, for each product there is (one or more) buy now pages which google gets as a cookie error pages, (one or more) reviews pages which are largly unpopulated, and so on.

On top of products and categories there are also lesser pages which for instance, list newer widgets, that have a plethora of variations of their own accord.

The categories and variations above are in one sense non-damaging, specifically because a unique title, breadcrumbs, heading, etc is generated on the duplicate content to reflect the url parameters of that content (for instance a category has phrases like p(Z) (in p(X) - p(Y)) where p(N) is a phrase associated with a category id and X,Y,and Z are as above). In one sense this gives us the ability to rank better on very specific searches because of robust titles.

One the flip side of the coin is the dillusion of our page rank by passing it to roughly identical content.
Sorting parameters illustrate this well - they append to the title (widget category (sorted by price) for example) but give much less bang for the buck from an SEO standpoint and as such serve only to create filler content. There are some merits in these pages in the sense they all link to main variation and generally their breadcrumbs to the higher level categories.

Our industry is largly comprised of a handful of generic type phrases which we are competitive but not necessarily first place for - we have, almost universally, good incoming links with anchortext corresponding to these phrases directed to a handful of pages on the site. This handful of pages is extremly heavily linked within the site itself. Getting first place on these phrases is the primary focus of my efforts. I am wondering whether cleaning up the duplicate content (sorting criteria), for instance via robots.txt, will cause pr to disperse in a manner that adversly effect the placement on these phrases.

Repeated Content and Page Rank Dillusion

nalin

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week