Msg#: 3274413 posted 8:48 pm on Mar 7, 2007 (gmt 0)
I have a directory on my web server where I keep my experimental files at - the ones under development, in a directory called 'dev'. My question is how do I exclude that whole directory from the bots, when it sits alongside the 'www' directory where the real web files exist?
For instance, my real homepage is located here on the server: /www/index.asp And my development file is here: /dev/index.asp
I'm wondering if each directory needs its own robots.txt file? Or is it as simple as this:
Msg#: 3274413 posted 6:24 pm on Mar 8, 2007 (gmt 0)
Here is what I've done - I'll post later on whether it worked or not;
I put the modified robots.txt file at the root level: /robots.txt
And I also left the old one where it was: /www/robots.txt
What I will do is check back tomorrow and look in the Google Webmaster Tools, and see which robots.txt Google has cached for the website. Hopefully I will see the modified one, so that I can delete the other.
Msg#: 3274413 posted 1:58 am on Mar 10, 2007 (gmt 0)
So, what I need to do is create a second robots.txt and place it into the other directory - the root of the development page.
i think i misread your earlier posts. my assumptions now are: - the production and development sites are separate (sub)domains (i originally thought your dev site was a subdirectory) - you want to allow all bots in the production directory (/www/) - you want to exclude all bots in the development directory (/dev/)
therefore use the following files...
User-agent: * Disallow:
User-agent: * Disallow: /
you can use the robots.txt tool in the google webmaster tools to verify which urls are allowed and disallowed by googlebot. you can make tweaks to code from the cached version in the form and then update the file on your site with the final version. not sure how often they update cache with a new file...