Forum Moderators: open
If you use any of the standard exclusion techniques (robots.txt, meta tags etc.), then it is likely to be a very long time before robots come back to index your pages once you actually want them to.
Having said that, I think Googlebot checks robots.txt on a fairly regular basis, but either way I wouldn't rely on it.
I would either;
a) Not put them anywhere on the public Internet until you actually want them spidered
or
b) Use a completely different URL for your test version, and then change the URL to something that can be spidered once you want to have the content indexed.
Having said that, I think Googlebot checks robots.txt on a fairly regular basis, but either way I wouldn't rely on it.
I recently stopped google visiting a site until it was finished by using the robots.txt file, and when i removed the block from the file it was only a matter of days before google started spidering. I did make sure that i had quite a lot of links to the site before I started allowing spiders though. This seemed to work, and I would make sure that you do start getting links while you are developing the site (if you can).
How do I make sure that google won't spider any pages until I am ready?
Turn off the toolbar or use a different browser while you're in development.
Also, turn off indexes in .htaccess if you don't need them.
Add this line to .htaccess:
Options -Indexes Or just add the
-Indexes if you already have an Options directive. You can do this at root URL level if you want it throughout the site, or just to your development subdirectory by using the appropriate .htaccess file. This is more of a general tip for development...I don't think it will impede spidering necessarily, unless googlebot tries to crawl a directory. Anyway, it keeps people and bots/agents from seeing your files. [edited by: Dolemite at 11:15 am (utc) on July 29, 2003]