Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: goodroi

Message Too Old, No Replies

Do search engines cache robots.txt?

     
3:04 pm on Apr 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7242
votes: 0


Hi,

Don't worry about the welcome message, I'm not new here, just changed name (previous login name became a keyword for a site of mine!). In fact, if one of the admins could send me a sticky about deleting my old posts (only about 4) I'd be grateful?

Question:-

I'm doing a new commercial site (my first actually) which will launch in about 6 months time. I do not want to get indexed by any search engines during the build (and stuff will have to be posted to test it out). If I block all in my robots.txt, can I just remove the block a month or so before we're ready to go and it's then SE friendly? If any of them cache the robots file, or take note for future ref. not to bother, then that could be a problem!

Thanks,

TJ

3:17 pm on Apr 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


TJ,

The longest time I've ever heard of a 'bot saving robots.txt is a day. So yes, your plan should work.

Jim

3:19 pm on Apr 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member macguru is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 30, 2000
posts:3300
votes: 0


This query brings some robots.txt files in the index : allinurl: robots.txt

It seems they are cached by Google.

I never had problems before by swapping the robots.txt file regarding the old cached version.

4:32 pm on Apr 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7242
votes: 0


Thanks for the response.

Even if they are cached, presumably that doesn't last forever and it's a case of us timing it right? Maybe we remove the robots file a couple of months before launch and point to a holding page of some sort? Or is that bad form?

I can't think of any other way of stalling a listing. We have a handful of good PR sites who want to link to us when we're operational. I suspect we will get fresh-botted quite quickly (maybe within a week) and indexed.

I can always ask the sites not to do the link until we're ready, but you know what people are like. I'd rather have control over the search engines instead.

TJ

4:46 pm on Apr 16, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 6, 2003
posts:630
votes: 0


Google gets a new version of robots.txt every time it requests a number of pages from your site:

In order to save bandwidth Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server.

- [google.com...]

Maybe we remove the robots file a couple of months before launch

I would keep an eye on the deep crawl schedule and remove robots.txt in the crawl before your launch.

list of robots.txt indexed by Google [google.com] in 'pr' order - cnn have some funny messages on banned pages ;)

4:52 pm on Apr 16, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 30, 2001
posts:373
votes: 0


I typically put a password on my news sites while I'm developing them. That's because not only do I not want robots crawling it, but I also don't want potential customers/users getting confused or turned off and I don't want competitors to know what I'm up to.

It's a pretty simple .htaccess file, typically, so I don't see it as being all that much work.

Perhaps that would better suit your needs and you wouldn't have to worry about your robots.txt file being cached.

4:57 pm on Apr 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Even if a robots.txt file is indexed and cached by Google, I doubt that Googlebot would use it. It would be in the "wrong space" for Googlebot to use. I'd be willing to bet that the 'bot references robots.txt by appending that filename to the domain name being spidered, rather than using any data from Google's index - it just wouldn't make any sense to do so.

It's strange that anyone would link to a robots.txt file, but I guess it could happen occasionally on a site like WebmasterWorld where people cite the robots.txt file here as an example.

TJ, if you remove your disallows from robots.txt a week before you want to go live, your should be fine. You should time that to coincide with the deep crawl. A single "holding page" for test purposes would be a great idea, though, just to boost your confidence.

Jim

7:22 pm on Apr 16, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 10, 2003
posts:146
votes: 0


Just a touch off topic here, but after doing a allinurl: robots.txt google search, I spotted WW's bot.txt. Terrific listing of all our friendly neighborhood spiders/bots.
10:42 pm on Apr 16, 2003 (gmt 0)

New User

10+ Year Member

joined:Apr 8, 2003
posts:6
votes: 0


Thanks JD, I think I'll do just that. If we're late in getting decent SERPS by a month, I can live with that....

TJ

11:18 pm on Apr 16, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 6, 2003
posts:630
votes: 0


It's strange that anyone would link to a robots.txt file

22 people link to google's robots.txt

One of the sites linking to them proposes a 'forbidden web' of only content banned by robots.txt ;)

A cool idea, but one I feel would not be very popular around here :)

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members