Background: I run a site which allows users (read: the public) to sell things. A major part of the outward-facing content on "product detail" pages is the description.
A rather small percentage of my clients are writing customized descriptions, hand-crafted text to describe a product that they created themselves, and that they're only selling via my service. The problem isn't with them.
The majority of the products being sold are "MRR" stuff, things that people have picked up the rights/license to sell, and are reselling. And in the majority of those, the description is copied & pasted verbatim from whatever bulk package they bought, and it's the same 5 paragraphs that you'll find on 20 other sites all over the www.
Why the majority? Because those sellers who have created their own products will have a catalog of 5, 10, maybe 15, self-made quality products. Those who are wholesaling MRR resale packages have often purchased a catalog of 2000+ products, which they resell for pennies apiece hoping to squeeze some profit working in bulk. There are e-books galore polluting the web teaching people how to do that to "earn massive wealth" using that technique... um yeah ha ha. From my POV I can see that the ones getting rich are the former, not the latter.
Pretty obvious duplicate content liability there.
My question is: what should I do about all these pages that are mostly copied text? I want to de-index them all - probably page by page using a <meta> no-index tag. Because the software that runs the site generates the product details page is highly templated, the easy solution is to no-index *all* of those product detail pages. Just throw them all away. Sifting through other people's content to figure out which are duped is not a feasible solution, in terms of man-hour resources available.
But that's distressing because the product detail pages make up easily 90% of the site's content, and some of them DO rank well for those few sellers who custom write their descriptions.
Individual pages are ranking fine for long-tail phrases, and that's very good for their sales. But overall, the site as a whole is suffering. For its most coveted query phrase, it once peaked at #9 in the Google SERPs, but has fallen since then to #18. I feel (just a hunch, a gut instinct) that the SEO is suffering because of all the duplicate content.
If I de-index all those product detail pages, I'd be throwing out the baby with the bathwater, so to speak.
Should I do it anyways?