MikeNoLastName - 11:11 am on Nov 26, 2011 (gmt 0)
Actually we have BOTH issues simultaneously in that section of our site: The writer is conglomerating a paragraph or two from 15-20 announcements from various other published and unpublished Public Relation sources and then publishing the collection on a subdirectory of our site, AND then syndicating the whole collection to other sources, who use all or part of it for online and offline re-publication.
And NO, I'm no longer sure this is what was causing our largest issue. Since we have removed the other mentioned "(not-so) obvious" Google-duplicated pages (mentioned a few posts above), we are steadily re-surfacing daily. Progress report is looking up although our home page is so-far only half-way back in the index. I expect it will take time, as it took 7 months to drop to current levels. But pages which had dropped to pg 25+ in the SERPS are now up to pg 3-6 and still rising daily. I think it just takes time for the algo to reapply the lost PR re-iteratively across all the pages.
I'm currently of the opinion, that onsite duplication is the primary issue in Panda and it could be issues (duplications) that have been around for years (or newly PERCEIVED duplications as in our case with G mistakenly re-indexing pages, see prior posts) and which are only now being given far stronger leveraged weight. Whereas previously if you had a copy of a page that you accidentally copied a duplicate of to the wrong directory years ago, which was being picked up by your sitemap.xml and indexed by G, it only penalized THAT page from the rankings and you scarcely noticed it. Nowadays, it seems they've multiplied the effect of them, and if you get as many as a handful of those on a large site it is causing the entire site to be multiply-penalized across the board. My guess is they at least eliminate most or all PR pass-through from both those pages. Make it, by chance, the high PR home page and one or two other high PR pages and suddenly you're leveraged into the dumpster.
My recommendation: If you are intentionally (or unintentionally or stupidly, like us in one instance) generating redundant pages with only a word or two different on each to (intentionally or unintentionally) gain "content" and +1 PR juice each - STOP! Combine them if possible, etc. If not, go through your site with a fine-toothed comb and look for any pages 90%? or higher duplicated on the same or any other directory on the SAME DOMAIN, before worrying about other domain syndication. Once you find them, remove them or noindex them and re-submit the highest ranked ones using the crawl-as-googlebot/URL- submission and the URL-removal features of Webmaster Tools for fastest turnaround. Then be patient, it could take a couple weeks.