Forum Moderators: open

Message Too Old, No Replies

Removing a cached PDF

Is it possible?

         

robertskelton

11:42 pm on Feb 25, 2003 (gmt 0)

10+ Year Member



I've torn apart the web, especially WebMasterWorld, Google and Abode - and couldn't find the answer, so I'm asking here:

Google have a system for removing pages from their cache. It involves placing a NOARCHIVE meta robot tag in the file, and then giving the file's URL to their automatic URL removal system.

However, PDF files don't have meta robot tags.

Is there any easy way of removing a PDF file containing sensitive information from Google's cache, or do I just email them and wait?

yetanotheruser

12:28 am on Feb 26, 2003 (gmt 0)

10+ Year Member



You could exlude it with a robots.txt file, is probably easiest.

If you can place your pdfs in their own folder ("/pdfs/my_sensitive_file.pdf" for example).. then a /robots.txt file with the lines:

User-agent: *
Disallow: /pdfs

should do the trick afaik.

HTH :)

robertskelton

12:51 am on Feb 26, 2003 (gmt 0)

10+ Year Member



In retrospect,

User-agent: *
Disallow: /pdfs

is what should have been done.

But, if I understand Google's instructions correctly, it won't remove the PDF from the cache.

[google.com...]

yetanotheruser

1:11 am on Feb 26, 2003 (gmt 0)

10+ Year Member



As an aside; I wasn't sure whether you could specify the specific file in the robots.txt, but it seems you can..

Regarding the PDF.. The only thing I can think to remove it before the next crawl/update cycle is to rename the PDF, and then use their automatic removal system to remove the old one. According to the page you've linked to, they will only remove pages (automatically) if the file no longer exists. (or you add the corresponding meta's which you can't as you've said.) I guess they'll send freshbot to check that your not being malicious..

I'm still not sure whether this'll remove the cached version aswell but I would assume it would.

atb,

:)