Forum Moderators: open
Google have a system for removing pages from their cache. It involves placing a NOARCHIVE meta robot tag in the file, and then giving the file's URL to their automatic URL removal system.
However, PDF files don't have meta robot tags.
Is there any easy way of removing a PDF file containing sensitive information from Google's cache, or do I just email them and wait?
User-agent: *
Disallow: /pdfs
is what should have been done.
But, if I understand Google's instructions correctly, it won't remove the PDF from the cache.
[google.com...]
Regarding the PDF.. The only thing I can think to remove it before the next crawl/update cycle is to rename the PDF, and then use their automatic removal system to remove the old one. According to the page you've linked to, they will only remove pages (automatically) if the file no longer exists. (or you add the corresponding meta's which you can't as you've said.) I guess they'll send freshbot to check that your not being malicious..
I'm still not sure whether this'll remove the cached version aswell but I would assume it would.
atb,
:)