I appreciate this subject has been mentioned before, but no definitive answer seems to have come up. We have a number of sites which have duplicate content across them, quite legitimately (the same news is being presented to different industry channels), but we believe this duplicate content is the cause of massive June 27 penalties. Removing the duplicate pages is not really an option, but if we mark the duplicates with a robots noindex tag, would that "remove" the duplication problem in Google's eyes?
A site with just 50 000 "real" pages was exposing 750 000 different URLs to search engines. Using robots exclusions, Google has already deindexed all of the alternative URLs. Now the canonical URL for each page of content also ranks a little higher too.