Robots.txt Disallow Multiple Level Subdomains

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Disallow Multiple Level Subdomains

Can a parent subdomain's robots.txt disallow spiders in a deeper subdomain?

rmjvol

3:24 pm on Apr 4, 2013 (gmt 0)

If a site uses multiple level subdomains (subdomain2.subdomain1.domain1.com), does each subdomain need to have a robots.txt to disallow the content or can a parent robots.txt file disallow a child?

My understanding is that each subdomain needs to have it's own robots.txt. Are there any exceptions?

I found an old Google Groups post [groups.google.com] where a Google rep said:

When a spider finds a URL, it takes the whole domain name (everything between 'http://' and the next '/'), then sticks a '/robots.txt' on the end of it and looks for that file. If that file exists, then the spider should read it to see where it is allowed to crawl.

If that's correct (& current), it sounds like there must be a robots.txt at subdomain2.subdomain1.domain1.com/robots.txt if you want to disallow all the content in that subdomain.

Is there any technical way that a subdomain could not have a robots.txt visible in its root and still be disallowed via robots.txt?

phranque

8:00 pm on Apr 4, 2013 (gmt 0)

the robot.txt file must be served from the hostname to which its exclusions apply.

lucy24

9:02 pm on Apr 4, 2013 (gmt 0)

If your real question is: Do I need to maintain a separate physical robots.txt file for each of my 800 wild-card subdomains? then the answer is no.

The rewrite is a little trickier than when it's happening on the same (sub)domain, but it can still be done.