homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
excluding a subdomain
Using the robots.txt file to exclude a subdomain
joebray




msg:3274415
 8:48 pm on Mar 7, 2007 (gmt 0)

I have a directory on my web server where I keep my experimental files at - the ones under development, in a directory called 'dev'. My question is how do I exclude that whole directory from the bots, when it sits alongside the 'www' directory where the real web files exist?

For instance, my real homepage is located here on the server: /www/index.asp
And my development file is here: /dev/index.asp

I'm wondering if each directory needs its own robots.txt file? Or is it as simple as this:

# Google
User-agent: googlebot
Disallow: /dev/

Thanks for your help in advance...

Joe Bray

 

phranque




msg:3274699
 1:00 am on Mar 8, 2007 (gmt 0)

there is only one robots.txt file that matters - in the root directory.
your solution is correct.
however, you may as well exclude all well-behaved robots.
User-agent: *
Disallow: /dev/

this page has the robots.txt standard [robotstxt.org].

joebray




msg:3275382
 4:25 pm on Mar 8, 2007 (gmt 0)

Thanks Phranque, but I think I may have the robots.txt file placed in the wrong directory on the server.

I have it inside the production website directory;

/www/robots.txt

But if I'm understanding you correctly, it should be located here instead:

/robots.txt

Does that sound right? Is there any way to test this sort of thing?

joebray




msg:3275535
 6:24 pm on Mar 8, 2007 (gmt 0)

Here is what I've done - I'll post later on whether it worked or not;

I put the modified robots.txt file at the root level: /robots.txt

And I also left the old one where it was: /www/robots.txt

What I will do is check back tomorrow and look in the Google Webmaster Tools, and see which robots.txt Google has cached for the website. Hopefully I will see the modified one, so that I can delete the other.

Joe

phranque




msg:3275933
 1:26 am on Mar 9, 2007 (gmt 0)

sorry - to be clear i meant the root directory of the domain.

it's the directory that contains your index.html or whatever when you request http://www.example.com/

not the root directory of your file system!

joebray




msg:3276682
 4:08 pm on Mar 9, 2007 (gmt 0)

Thanks phranque, that does seem to be the case. I checked to see what Google has cached this morning, and it is the one that sits alongside the main index.asp page of the production site - its root.

So, what I need to do is create a second robots.txt and place it into the other directory - the root of the development page.

Thanks for helping me work thru this...

Joe

phranque




msg:3277204
 1:58 am on Mar 10, 2007 (gmt 0)

So, what I need to do is create a second robots.txt and place it into the other directory - the root of the development page.

i think i misread your earlier posts.
my assumptions now are:
- the production and development sites are separate (sub)domains (i originally thought your dev site was a subdirectory)
- you want to allow all bots in the production directory (/www/)
- you want to exclude all bots in the development directory (/dev/)

therefore use the following files...

/www/robots.txt:
User-agent: *
Disallow:

/dev/robots.txt:
User-agent: *
Disallow: /

you can use the robots.txt tool in the google webmaster tools to verify which urls are allowed and disallowed by googlebot.
you can make tweaks to code from the cached version in the form and then update the file on your site with the final version.
not sure how often they update cache with a new file...

joebray




msg:3279425
 3:09 pm on Mar 12, 2007 (gmt 0)

Thanks phranque, for your help. I will do just that.

phranque




msg:3280537
 1:11 pm on Mar 13, 2007 (gmt 0)

please post your success or failure to help future searches on this thread...

System
redhat



msg:3290797
 9:47 am on Mar 23, 2007 (gmt 0)

The following message was cut out to new thread by goodroi. New thread at: robots_txt/3290795.htm [webmasterworld.com]
6:24 am on Mar. 23, 2007 (utc -5)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved