| 3:17 pm on Apr 16, 2003 (gmt 0)|
The longest time I've ever heard of a 'bot saving robots.txt is a day. So yes, your plan should work.
| 3:19 pm on Apr 16, 2003 (gmt 0)|
This query brings some robots.txt files in the index : allinurl: robots.txt
It seems they are cached by Google.
I never had problems before by swapping the robots.txt file regarding the old cached version.
| 4:32 pm on Apr 16, 2003 (gmt 0)|
Thanks for the response.
Even if they are cached, presumably that doesn't last forever and it's a case of us timing it right? Maybe we remove the robots file a couple of months before launch and point to a holding page of some sort? Or is that bad form?
I can't think of any other way of stalling a listing. We have a handful of good PR sites who want to link to us when we're operational. I suspect we will get fresh-botted quite quickly (maybe within a week) and indexed.
I can always ask the sites not to do the link until we're ready, but you know what people are like. I'd rather have control over the search engines instead.
| 4:46 pm on Apr 16, 2003 (gmt 0)|
Google gets a new version of robots.txt every time it requests a number of pages from your site:
|In order to save bandwidth Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server. |
|Maybe we remove the robots file a couple of months before launch |
I would keep an eye on the deep crawl schedule and remove robots.txt in the crawl before your launch.
list of robots.txt indexed by Google [google.com] in 'pr' order - cnn have some funny messages on banned pages ;)
| 4:52 pm on Apr 16, 2003 (gmt 0)|
I typically put a password on my news sites while I'm developing them. That's because not only do I not want robots crawling it, but I also don't want potential customers/users getting confused or turned off and I don't want competitors to know what I'm up to.
It's a pretty simple .htaccess file, typically, so I don't see it as being all that much work.
Perhaps that would better suit your needs and you wouldn't have to worry about your robots.txt file being cached.
| 4:57 pm on Apr 16, 2003 (gmt 0)|
Even if a robots.txt file is indexed and cached by Google, I doubt that Googlebot would use it. It would be in the "wrong space" for Googlebot to use. I'd be willing to bet that the 'bot references robots.txt by appending that filename to the domain name being spidered, rather than using any data from Google's index - it just wouldn't make any sense to do so.
It's strange that anyone would link to a robots.txt file, but I guess it could happen occasionally on a site like WebmasterWorld where people cite the robots.txt file here as an example.
TJ, if you remove your disallows from robots.txt a week before you want to go live, your should be fine. You should time that to coincide with the deep crawl. A single "holding page" for test purposes would be a great idea, though, just to boost your confidence.
| 7:22 pm on Apr 16, 2003 (gmt 0)|
Just a touch off topic here, but after doing a allinurl: robots.txt google search, I spotted WW's bot.txt. Terrific listing of all our friendly neighborhood spiders/bots.
| 10:42 pm on Apr 16, 2003 (gmt 0)|
Thanks JD, I think I'll do just that. If we're late in getting decent SERPS by a month, I can live with that....
| 11:18 pm on Apr 16, 2003 (gmt 0)|
|It's strange that anyone would link to a robots.txt file |
22 people link to google's robots.txt
One of the sites linking to them proposes a 'forbidden web' of only content banned by robots.txt ;)
A cool idea, but one I feel would not be very popular around here :)