Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

disallowing a folder with its own domain name

subdomain being spidered although it has its own domain name

12:52 pm on Oct 26, 2005 (gmt 0)

New User

10+ Year Member

joined:Oct 8, 2005
votes: 0

I have two websites that are hosted in the same webspace.

Both have their own domain names which point to the correct folder for each.

However yahoo appears to be indexing both sites on the same domain name and I am worried that it will cause a duplicate content issue because the subdomain it spiders from within one domain will be the eact same content as for the actual domain name submitted.

Can I put a robots.txt in each folder and in the root. In the root which has its own domain name, if i disallow spidering of the folders with their own domain names, will the spiders when crawling the domain names still get to the folders the sites are in, or do they automatically go to the root of the webspace and se the robot.txt there.

The domain names poit directly to the folders, so if I put robots.txt in each folder specifying not to go out the folder and one in the root saying not to go in any of the other domain name folders, would this solve teh problem. I am only worried that I might inadvertently disallow spidering of the other domain name folders by including a disallow in the root of the hostings webspace.

Thankyou if you can help at all.


8:52 pm on Oct 26, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
votes: 0

First thing to remember is that the robots have no knowledge of your folder structure, so all paths are relevant to the domain that it is requesting URL's from.

So if you have
www.example1.com pointing to a folder called public_html and www.example2.com pointing to a sub folder of public_html called mysubdomain
Then in the robots.txt file in public_html have
disallow: /mysubdomain

The robots.txt in the public_html will only be used for www.example1.com as only robots.txt in the domains root folder will be used.

If you were to put a robots.txt in the mysubdomain folder for the above example, it would not be picked up for www.example1.com, but would be for www.example2.com. However all disallows would be from the mysubdomain folder down, e.g if you wanted to disallow www.example2.com/page3.htm you would have
disallow: /page3.htm