Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Manufacturer's Product Detail PDF Files - disallow in robots.txt?

         

wingslevel

1:59 pm on Jan 15, 2011 (gmt 0)

10+ Year Member



I have a site with 100s of thousands of pages. Ecommerce. Probably 10% of the products have pdf files from the manufacturer with lots of additional info. I have a link on the product page to the pdf file. So, if the product page is mydomain,com/widget123.htm, then the link to the pdf goes to mydomain.com/pdfs/widget123.pdf. Of course lots of my ecommerce brethren have access to and use the same pdf files and they are also on the manufacturer's website. So rather than be just another guy with the same exact content, i excluded the entire /pdfs/ directory in my robots. that amounts to 10s of thousands of pages.

Am I overdoing it? should i unrestrict them and just let google sort it? think having so many pages restricted is a red flag?

I actually had a secondary reason for restricting them. if they did rank, users wouldn't be able to link back to my site off of them - i suppose i could edit them all and add a link back to my home page....

tedster

4:02 pm on Jan 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I don't think having so many restricted files is a red flag of any kind. This is a judgment call that can go either way.

I think the main concern would be in using up googlebot crawl budget for those files. If your site is getting crawled quite frequently, then that would not be a concern. It you are already not seeing the kind of re-crawl you would prefer, then I'd avoid it.

As you say, if you allow crawling, then Google will be able to sort it out and your site might pick up a bit of traffic. I always think that a link to your site or two is a good idea in a PDF file, even if you DON'T allow crawling. And if yo do, those links can help to circulate link equity.