Forum Moderators: Robert Charlton & goodroi
Why not add more relevant content to the affected pages?
How were these pages translated?
I think we will indeed try to noindex the pages not not only a page level, but via robots.txt disallow directive
The pattern is pretty undeniable. Optimized, separate pages work better.
so I am wondering if the various Google global divisions have different definitions of what "thin content" constitutes.
Whoa ... there's a trap here. If you block a page via robots.txt, the spiders won't be able to read the noindex instruction and won't know to take the page out of the index!
My advice would be to try to eliminate the duplication by combining pages. Make your content richer, and make navigating your site more rewarding on a variety of levels. Assume a range of user intent and experience, and provide good background material for all likely users.
Remove the pages (410 Gone) if you are giving up on the indexing. This would go a long way toward strengthening site confidence.
We do have roughly ~2-3 paragraphs worth of unique content per page, but relative to the article lengths of roughly ~1500-2000 words, so it must not enough in Google's eyes.
The problem is that the volume of queries is so large, it's been challenging to figure out a structure to still capture these queries
Giving an entire site a penalty because you don't like some pages is silly, just don't rank the pages you dislike.