Welcome to WebmasterWorld Guest from 54.221.87.97

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt Disallow - How much time till its obeyed?

     

notsleepy

1:51 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



I've created a new page my-page.htm that I do not want to be included in the Google index.

I've added the exclusion in my robots.txt:
User-agent: *
Disallow: /my-page.htm

However I haven't uploaded the file yet b/c I know it takes some time for all googlebots to get word that they should not follow links to that page.

Does anyone have an estimate on how long I should wait before uploading the page?

tschild

4:00 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



In my experience Googlebot has complied with robots.txt changes virtually immediately. If you don't want the page to be indexed at all cost you can always include <meta name="robots" content="noindex,nofollow"> in the page head.

jdMorgan

4:18 pm on Aug 6, 2003 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



notsleepy,

Use the WebmasterWorld server headers checker [webmasterworld.com] to determine what your Expires and Cache:max-age settings are for your existing robots.txt. Google adheres to these settings reliably (if you provide them).

Example:

HTTP/1.1 200 OK 
Date: Wed, 06 Aug 2003 16:14:01 GMT
Server: Rapidsite/Apa/1.3.27 (Unix) FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.7a
Cache-Control: must-revalidate, max-age=7200
Expires: Wed, 06 Aug 2003 18:17:30 GMT
Last-Modified: Sat, 02 Aug 2003 05:40:41 GMT
ETag: "1b6eae6-ac6-3f2b4ed9"
Accept-Ranges: bytes
Content-Length: 2758
Connection: close
Content-Type: text/plain

This shows that my robots.txt is to be considered valid for only two hours, and must be re-fetched if the user-agent has an older copy.

Jim

g1smd

10:48 pm on Aug 6, 2003 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



For a page already listed it can take 6 weeks after adding robots meta tag for Google to forget about the old page.

For a new and unlisted page, Google should just never index it.

notsleepy

11:16 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



Thanks for the info.

tschild
I can't add the meta tags to the page b/c its a dynamic page that essentially performs some database work and redirects to another page.

jdMorgan:
I haven't previously set the expires and cache:max-age but I will now.

Here is my scenario that didn't work:

I updated my robots.txt to Disallow a page.
Within one day I saw at least one googlebot access my new robots.txt
One week later I upload the new page I don't want spidered.
One more week later I find it in the index.

EarWig

11:17 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



You can remove pages within 24- 48 hours using the Google Removal Tool if you have set up the page meta tags to "noindex"

[services.google.com:8882...]
Remove a single page using meta tags.

Regards
EW