If a site uses multiple level subdomains (subdomain2.subdomain1.domain1.com), does each subdomain need to have a robots.txt to disallow the content or can a parent robots.txt file disallow a child?
My understanding is that each subdomain needs to have it's own robots.txt. Are there any exceptions?
I found an old
Google Groups post [groups.google.com] where a Google rep said:
When a spider finds a URL, it takes the whole domain name (everything between 'http://' and the next '/'), then sticks a '/robots.txt' on the end of it and looks for that file. If that file exists, then the spider should read it to see where it is allowed to crawl.
If that's correct (& current), it sounds like there must be a robots.txt at subdomain2.subdomain1.domain1.com/robots.txt if you want to disallow all the content in that subdomain.
Is there any technical way that a subdomain could not have a robots.txt visible in its root and still be disallowed via robots.txt?