homepage Welcome to WebmasterWorld Guest from 54.237.184.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt Disallow - How much time till its obeyed?
notsleepy




msg:1527849
 1:51 pm on Aug 6, 2003 (gmt 0)

I've created a new page my-page.htm that I do not want to be included in the Google index.

I've added the exclusion in my robots.txt:
User-agent: *
Disallow: /my-page.htm

However I haven't uploaded the file yet b/c I know it takes some time for all googlebots to get word that they should not follow links to that page.

Does anyone have an estimate on how long I should wait before uploading the page?

 

tschild




msg:1527850
 4:00 pm on Aug 6, 2003 (gmt 0)

In my experience Googlebot has complied with robots.txt changes virtually immediately. If you don't want the page to be indexed at all cost you can always include <meta name="robots" content="noindex,nofollow"> in the page head.

jdMorgan




msg:1527851
 4:18 pm on Aug 6, 2003 (gmt 0)

notsleepy,

Use the WebmasterWorld server headers checker [webmasterworld.com] to determine what your Expires and Cache:max-age settings are for your existing robots.txt. Google adheres to these settings reliably (if you provide them).

Example:

HTTP/1.1 200 OK 
Date: Wed, 06 Aug 2003 16:14:01 GMT
Server: Rapidsite/Apa/1.3.27 (Unix) FrontPage/5.0.2.2510 mod_ssl/2.8.12 OpenSSL/0.9.7a
Cache-Control: must-revalidate, max-age=7200
Expires: Wed, 06 Aug 2003 18:17:30 GMT
Last-Modified: Sat, 02 Aug 2003 05:40:41 GMT
ETag: "1b6eae6-ac6-3f2b4ed9"
Accept-Ranges: bytes
Content-Length: 2758
Connection: close
Content-Type: text/plain

This shows that my robots.txt is to be considered valid for only two hours, and must be re-fetched if the user-agent has an older copy.

Jim

g1smd




msg:1527852
 10:48 pm on Aug 6, 2003 (gmt 0)

For a page already listed it can take 6 weeks after adding robots meta tag for Google to forget about the old page.

For a new and unlisted page, Google should just never index it.

notsleepy




msg:1527853
 11:16 pm on Aug 6, 2003 (gmt 0)

Thanks for the info.

tschild
I can't add the meta tags to the page b/c its a dynamic page that essentially performs some database work and redirects to another page.

jdMorgan:
I haven't previously set the expires and cache:max-age but I will now.

Here is my scenario that didn't work:

I updated my robots.txt to Disallow a page.
Within one day I saw at least one googlebot access my new robots.txt
One week later I upload the new page I don't want spidered.
One more week later I find it in the index.

EarWig




msg:1527854
 11:17 pm on Aug 6, 2003 (gmt 0)

You can remove pages within 24- 48 hours using the Google Removal Tool if you have set up the page meta tags to "noindex"

[services.google.com:8882...]
Remove a single page using meta tags.

Regards
EW

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved