Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
Don't worry about the welcome message, I'm not new here, just changed name (previous login name became a keyword for a site of mine!). In fact, if one of the admins could send me a sticky about deleting my old posts (only about 4) I'd be grateful?
I'm doing a new commercial site (my first actually) which will launch in about 6 months time. I do not want to get indexed by any search engines during the build (and stuff will have to be posted to test it out). If I block all in my robots.txt, can I just remove the block a month or so before we're ready to go and it's then SE friendly? If any of them cache the robots file, or take note for future ref. not to bother, then that could be a problem!
Even if they are cached, presumably that doesn't last forever and it's a case of us timing it right? Maybe we remove the robots file a couple of months before launch and point to a holding page of some sort? Or is that bad form?
I can't think of any other way of stalling a listing. We have a handful of good PR sites who want to link to us when we're operational. I suspect we will get fresh-botted quite quickly (maybe within a week) and indexed.
I can always ask the sites not to do the link until we're ready, but you know what people are like. I'd rather have control over the search engines instead.
In order to save bandwidth Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server.
Maybe we remove the robots file a couple of months before launch
I would keep an eye on the deep crawl schedule and remove robots.txt in the crawl before your launch.
list of robots.txt indexed by Google [google.com] in 'pr' order - cnn have some funny messages on banned pages ;)
It's a pretty simple .htaccess file, typically, so I don't see it as being all that much work.
Perhaps that would better suit your needs and you wouldn't have to worry about your robots.txt file being cached.
It's strange that anyone would link to a robots.txt file, but I guess it could happen occasionally on a site like WebmasterWorld where people cite the robots.txt file here as an example.
TJ, if you remove your disallows from robots.txt a week before you want to go live, your should be fine. You should time that to coincide with the deep crawl. A single "holding page" for test purposes would be a great idea, though, just to boost your confidence.