I have an opportunity to license content that's relevant to my niche from a print magazine with no appreciable web presence. Some of the content in question has been published online before by third parties, e.g. when the magazine runs a favorable reviews of a widget, the widget manufacturer sometimes transcribes the review and publishes it on their own site.
If my licensing deal goes through, I'd be publishing about 100 pages of content (meaning, 100 widget reviews), maybe one-third of which Google has already seen on the web at vendors' sites.
My version of the content would be official, but Googlebot wouldn't know that -- having seen the same text elsewhere, googlebot would conclude that I'm publishing duplicate content.
Should I "noindex" the pages that have already been published elsewhere?
Interesting situation. Normally, I don't worry much about duplicating content from other sites when I'm placing it on an already strong site. I just let Google do their job and filter as they decide to filter. Sometimes that means traffic to those dupe pages on the site I'm working with.
But you might have an issue with quantity here - publishing 100 new pages with 30 or more of them duplicate just might be a problem. I like the idea of noindexing them, at least at first, until the rest of the pages are indexed and ranking. Then, if it seems like there's search potential for some of those noindex pages, you might try a gradual release. After all, you do have the legitimate license to publilsh on your site.
If you can build some backlinks to those new articles, that might help Google like you as the source. And if these 100 new pages are just a drop in the bucket for the total of pages on your site, then you might well have no problems at any rate.