|should I de-index 90% of my site?|
Background: I run a site which allows users (read: the public) to sell things. A major part of the outward-facing content on "product detail" pages is the description.
A rather small percentage of my clients are writing customized descriptions, hand-crafted text to describe a product that they created themselves, and that they're only selling via my service. The problem isn't with them.
The majority of the products being sold are "MRR" stuff, things that people have picked up the rights/license to sell, and are reselling. And in the majority of those, the description is copied & pasted verbatim from whatever bulk package they bought, and it's the same 5 paragraphs that you'll find on 20 other sites all over the www.
Why the majority? Because those sellers who have created their own products will have a catalog of 5, 10, maybe 15, self-made quality products. Those who are wholesaling MRR resale packages have often purchased a catalog of 2000+ products, which they resell for pennies apiece hoping to squeeze some profit working in bulk. There are e-books galore polluting the web teaching people how to do that to "earn massive wealth" using that technique... um yeah ha ha. From my POV I can see that the ones getting rich are the former, not the latter.
Pretty obvious duplicate content liability there.
My question is: what should I do about all these pages that are mostly copied text? I want to de-index them all - probably page by page using a <meta> no-index tag. Because the software that runs the site generates the product details page is highly templated, the easy solution is to no-index *all* of those product detail pages. Just throw them all away. Sifting through other people's content to figure out which are duped is not a feasible solution, in terms of man-hour resources available.
But that's distressing because the product detail pages make up easily 90% of the site's content, and some of them DO rank well for those few sellers who custom write their descriptions.
Individual pages are ranking fine for long-tail phrases, and that's very good for their sales. But overall, the site as a whole is suffering. For its most coveted query phrase, it once peaked at #9 in the Google SERPs, but has fallen since then to #18. I feel (just a hunch, a gut instinct) that the SEO is suffering because of all the duplicate content.
If I de-index all those product detail pages, I'd be throwing out the baby with the bathwater, so to speak.
Should I do it anyways?
How about a heuristic to noindex product pages from sellers with more than 20 items? Sounds like it would do almost entirely the right thing for you with no manual effort to identify the duplicate.
I'd write (or consult out for) a script to identify and no-index the duplicate descriptions as a batch process. Long term, perhaps your system could identify a duplicate description when the seller submits it and request an improvement or offer to link to a canonical description.
Google SERP concern regarding duplicate content can be good motivation for differentiating your site from the 20 others with unimproved, copied descriptions, provided you push for better than all-or-none de-index solutions.
Hmm, could you offer a dropdown (or checkbox or radio button) with two choices: "Custom made product description unique to this site" and "Generic description (cut/paste from elsewhere)" that the sellers would have to select when adding a product? And have a default "How the product description was generated" and in that way forcing them to choose one of options? Or prompt them to answer this question at the end of the product addition, at time of saving the new product? Or even default to "cut/paste" in a hope that someone who spent the time writing a unique description would be observant enough to select that their description was custom written and unique to your site?
Then you could noindex pages with "cut/paste" option and leave others in the index.
After this implementation you could spot-check a number of "custom made" descriptions to see if this method works (i.e. that products selected as "custom description" indeed have unique descriptions) to see if this method works or not.
You may have some false positives and false negatives (so noindex pages that should not be noindexed or index pages that should be noindex), but you could then perhaps judge on % correct choices whether this method works or not.
<added> Of course this could only work for newly added products and would not address your current product pages </added>
I think you should consider, if you noindex a lot of pages on your site, whether you actually will be helping the site's "quality profile" in any substantial way.
The algo is intended to raise the overall user experience on the site. One of the determinations that's been made by Google (after some research) is that dupe or thin content impacts user experience negatively. We don't yet know for sure whether the quality profile is directly based on user behavior on your site, or is index-based, using an algo that is being calibrated by overall user behavior. I think it may be both.
If it's based on user behavior specifically on your site, and users don't like very thin or dupe content, then noindexing pages might not help if, say, very many visitors navigate to dupe or thin pages and then back out of them or jump to another site.
If the algorithm is purely index-based, and there are a predominance of noindexed pages on your site, then you may be losing quality points by the noindexing. This because your category pages are going to be linking to a lower percentage of pages that Google is able to perceive as high quality. There may be little or no difference in this situation between a lot of very thin pages and a lot of no-indexed pages.
In a way, the site as you've described it, where anyone can get listings and add content, is analogous to a directory that accepts anyone and doesn't edit listings. Google likes directories that have editorial standards and maintains them, and this most likely now applies to user-submitted product sites as well.
Conceivably, this means your site may eventually evolve to a site that doesn't accept all listings. You may need to charge for reviews and editing. I'm not sure this is a viable option in your area. One thing I'd consider in handling a transition is whether losing many of the MRR resellers would be a bad thing in the long run.
thanks for these careful and thoughtful replies. I like aakk9999's suggestion, of allowing the user to indicate whether their content is unique. I believe most people would check that box honestly. And if it's accompanied by a blurb about why they should rewrite their descriptions, perhaps some of them would.
After all, these are people who want to succeed. They're not trying to poison the site. They're trying to sell products, and anything I can do to help them do that better - even if it means schooling them in basic SEO, is a win-win.
I no longer think that de-indexing all the pages is a good idea.
If I set the "index_this" as a flag in the database, I can default them all to FALSE. Then, I can manually turn them back on to TRUE. It's easy for me to identify individual sellers who I know have written all their own material, and I can flip all their products on in one go. Those that are left... would have to do it themselves. An email sent out to all the sellers can notify them of the change, and what they ought to do about it.
More work than I'd bargained, but alas most worthwhile things usually are.