I am suffering from duplicate content penalties across my network of sites, even though running the same articles on multiple sites (for different audiences) is a quite morally valid procedure, editorially.
However, Google has taken over from the reader as king now, sadly. To ovecome the penalties while leaving the pages in place, I have opted for a strategy of marking duplicates with NOINDEX meta tags, and these are being gradually picked up by Googlebot and removed from the index, which should eventually give me a duplicate-free network of sites in Google's eyes.
I am also in the process of setting up sitemaps for each site. The question now is - option 1 - do I include in the sitemaps the pages marked with NOINDEX meta tags (even though that's a rather strange thing to do), because I still need Google to crawl these pages to realise they've been NOINDEXed? Or - option 2 - do I omit the NOINDEXed pages, giving a "true" representation of my slimmed-down sites, and hope that the NOINDEXed pages get crawled separately regardless, through more conventional spidering? There are still good links to these NOINDEXed pages on the sites.
I decided to use the G removal tool on the NOINDEX pages, which I hoped addressed the issue of bringing these to Google's attention. I then removed them from my G sitemaps -- I list only *valid and good* urls there-- and resubmitted the sitemaps.
After 4 days I'm seeing a slight improvement, but nothing yet to prove I've got the answer.