I have a site with 100s of thousands of pages. Ecommerce. Probably 10% of the products have pdf files from the manufacturer with lots of additional info. I have a link on the product page to the pdf file. So, if the product page is mydomain,com/widget123.htm, then the link to the pdf goes to mydomain.com/pdfs/widget123.pdf. Of course lots of my ecommerce brethren have access to and use the same pdf files and they are also on the manufacturer's website. So rather than be just another guy with the same exact content, i excluded the entire /pdfs/ directory in my robots. that amounts to 10s of thousands of pages.
Am I overdoing it? should i unrestrict them and just let google sort it? think having so many pages restricted is a red flag?
I actually had a secondary reason for restricting them. if they did rank, users wouldn't be able to link back to my site off of them - i suppose i could edit them all and add a link back to my home page....