phranque

msg:3274699 | 1:00 am on Mar 8, 2007 (gmt 0) |
there is only one robots.txt file that matters - in the root directory. your solution is correct. however, you may as well exclude all well-behaved robots. User-agent: * Disallow: /dev/ |
| this page has the robots.txt standard [robotstxt.org].
|
joebray

msg:3275382 | 4:25 pm on Mar 8, 2007 (gmt 0) |
Thanks Phranque, but I think I may have the robots.txt file placed in the wrong directory on the server. I have it inside the production website directory; /www/robots.txt But if I'm understanding you correctly, it should be located here instead: /robots.txt Does that sound right? Is there any way to test this sort of thing?
|
joebray

msg:3275535 | 6:24 pm on Mar 8, 2007 (gmt 0) |
Here is what I've done - I'll post later on whether it worked or not; I put the modified robots.txt file at the root level: /robots.txt And I also left the old one where it was: /www/robots.txt What I will do is check back tomorrow and look in the Google Webmaster Tools, and see which robots.txt Google has cached for the website. Hopefully I will see the modified one, so that I can delete the other. Joe
|
phranque

msg:3275933 | 1:26 am on Mar 9, 2007 (gmt 0) |
sorry - to be clear i meant the root directory of the domain. it's the directory that contains your index.html or whatever when you request http://www.example.com/ not the root directory of your file system!
|
joebray

msg:3276682 | 4:08 pm on Mar 9, 2007 (gmt 0) |
Thanks phranque, that does seem to be the case. I checked to see what Google has cached this morning, and it is the one that sits alongside the main index.asp page of the production site - its root. So, what I need to do is create a second robots.txt and place it into the other directory - the root of the development page. Thanks for helping me work thru this... Joe
|
phranque

msg:3277204 | 1:58 am on Mar 10, 2007 (gmt 0) |
| So, what I need to do is create a second robots.txt and place it into the other directory - the root of the development page. |
| i think i misread your earlier posts. my assumptions now are: - the production and development sites are separate (sub)domains (i originally thought your dev site was a subdirectory) - you want to allow all bots in the production directory (/www/) - you want to exclude all bots in the development directory (/dev/) therefore use the following files... /www/robots.txt: /dev/robots.txt: User-agent: * Disallow: / |
| you can use the robots.txt tool in the google webmaster tools to verify which urls are allowed and disallowed by googlebot. you can make tweaks to code from the cached version in the form and then update the file on your site with the final version. not sure how often they update cache with a new file...
|
joebray

msg:3279425 | 3:09 pm on Mar 12, 2007 (gmt 0) |
Thanks phranque, for your help. I will do just that.
|
phranque

msg:3280537 | 1:11 pm on Mar 13, 2007 (gmt 0) |
please post your success or failure to help future searches on this thread...
|
System redhat

msg:3290797 | 9:47 am on Mar 23, 2007 (gmt 0) |
The following message was cut out to new thread by goodroi. New thread at: robots_txt/3290795.htm [webmasterworld.com] 6:24 am on Mar. 23, 2007 (utc -5)
|
|