Forum Moderators: goodroi
Today I did a search on the topic of one of these pdfs and sure enough it still comes up on Google.
The robots.txt file reads:
Disallow: /*.pdf
Disallow: *.pdf
Is this correct? Or is there a mistake in the file?
User-agent: Googlebot
Disallow: /*.pdf$
But, and this is a very big but, this is non-standard robots.txt syntax. As far as I know only Googlebot is supposed to support it. So, even if you happen to block Google, all others that are able to fetch pdf files will.
The best bet is to block by directory or, if that isn't feasible, by individual file:
User-agent: *
Disallow: /thisdirectory/
User-agent: *
Disallow: /thisdirectory/thisfile.pdf
Disallow: /thatdirectory/thatfile.pdf
Robotstxt.org [robotstxt.org] has the standards for the robots exclusion protocol.