We've just done a whole bunch of analysis on the dup issues with G, and I wish to post an observation about just one aspect of the current problems:
The fact that even within a single site, when pages are deemed too similar, G is not throwing out the dups - they're throwing out ALL the similar pages.
The result of this miscalculation is that high quality pages from leading/authoritative sites, some that also act as hubs, are lost in the SERP's. In most cases, these pages are not actually penalized or pushed into the Supplemental index. They are simply dampened so badly that they no longer appear anywhere in the SERP's.
The current problem is actually not new IMHO. It began surfacing on or about Dec 15 or 16 of last year. At that time, the best page for the query simply seemed to take a 5-10 spot drop in the SERP's...enough to kill most traffic to the page, but at least the page was still in the SERP's. If there were previously indented listings, those were dropped way down.
From early Feb through about mid March, the situation was corrected and the best pages for specific queries were again elevated to higher rankings. When indented listings were involved however, the indented listing seemed now to be less relevant than was the case pre-Dec.
In mid March to about mid May, the situation worsened again, approximately to the problems witnessed in mid Dec., i.e., the most relevant pages dropped 5-10 spots, indents vanished as was the case in Dec.
But the most serious aspect of the problem began in mid May, when G started dropping even the best page for the query out of the visible SERP's.
A few days ago, the problem worsened, going deeper into the ranks of high quality, authoritative sites. This added fuel to what has become the longest non-update thread [webmasterworld.com] I've ever seen.
Why This is Such a Problem
The short answer is, that a lot of very useful, relevant pages, are now not being featured. I'm not talking about just downgraded. They're nowhere.
Now, I'm sure that there are sites that deserved the loss of these vanished pages. But there are plenty of others whose absense is simply hurting the SERP's. There is a difference between indexing the world's information, and making it available after all.
What is an Affected Site To Do?
One option, presumably, would be to stop allowing the robots to index the lesser pages that are 'causing' the SE's to drop ALL the related pages. But this is a disservice to the user, especially in an era when GG has gone on record as taking pride in delivering especially relevant results, and especially for longer tail terms.
Should we noindex all the bee subpages, so that at least searchers can find SOME page on bees from this site? (I'm assuming that noindexing or nofollowing the 'dup' pages that are not really 'dup' pages at all would nonetheless free the one remaining page on the topic to resurface; perhaps a bad assumption.)
In any case, I refuse. Talk about rigging sites simply for the purpose of ranking. That's exactly what we're NOT supposed to be doing.
G needs to sort this out. ;-)
Note: Posters, please limit comments to the specific issues outlined in this thread. There are a lot of dup issues out there right now. This is just one of them.