Welcome to WebmasterWorld Guest from 220.127.116.11
What we would typically recommend is to just go ahead and let the Googlebot crawl those pages and then de-duplicate them on our end. Or, if you have the ability, you can use site architecture to fix any duplication issues in advance. If your site is 50 percent KML files or you have a disproportionately large number of fonts and you really don't want any of them crawled, you can certainly use robots.txt. Robots.txt does allow a wildcard within individual directives, so you can block them. For most sites that are typically almost all HTML with just a few additional pages or different additional file types, I would recommend letting Googlebot crawl those.
[edited by: alika at 11:50 am (utc) on Apr 22, 2011]
So, if the adsense theory is correct, where is the sense in a company with responsibilities to it's shareholders encouraging the destruction of one of it's biggest earners?
I find it hard to believe that my quality score gets affected for a stray adsense unit when all of those sites SE traffic doubled after Panda.
[edited by: kd454 at 2:23 pm (utc) on Apr 22, 2011]
That said, the ad factor is definitely worth exploring, hopefully we see some more data from everyone else.
causation or correlation: in the original Panda seed sites there was a question asked: does this site have too many ads? And now adsense comes with 100% different suggestions. Do sites with blocks of adsense above the fold scream 'quality'?
So do the math
I think it's time for a group hug.
Try thinking "must have content above the fold" rather than "must not have ads above the fold"