Forum Moderators: goodroi
I've added the exclusion in my robots.txt:
User-agent: *
Disallow: /my-page.htm
However I haven't uploaded the file yet b/c I know it takes some time for all googlebots to get word that they should not follow links to that page.
Does anyone have an estimate on how long I should wait before uploading the page?
Use the WebmasterWorld server headers checker [webmasterworld.com] to determine what your Expires and Cache:max-age settings are for your existing robots.txt. Google adheres to these settings reliably (if you provide them).
Example:
HTTP/1.1 200 OK
Date: Wed, 06 Aug 2003 16:14:01 GMT
Server: Rapidsite/Apa/1.3.27 (Unix) FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.7a
Cache-Control: must-revalidate, max-age=7200
Expires: Wed, 06 Aug 2003 18:17:30 GMT
Last-Modified: Sat, 02 Aug 2003 05:40:41 GMT
ETag: "1b6eae6-ac6-3f2b4ed9"
Accept-Ranges: bytes
Content-Length: 2758
Connection: close
Content-Type: text/plain
This shows that my robots.txt is to be considered valid for only two hours, and must be re-fetched if the user-agent has an older copy.
Jim
tschild
I can't add the meta tags to the page b/c its a dynamic page that essentially performs some database work and redirects to another page.
jdMorgan:
I haven't previously set the expires and cache:max-age but I will now.
Here is my scenario that didn't work:
I updated my robots.txt to Disallow a page.
Within one day I saw at least one googlebot access my new robots.txt
One week later I upload the new page I don't want spidered.
One more week later I find it in the index.
[services.google.com:8882...]
Remove a single page using meta tags.
Regards
EW