I have two websites that are hosted in the same webspace.
Both have their own domain names which point to the correct folder for each.
However yahoo appears to be indexing both sites on the same domain name and I am worried that it will cause a duplicate content issue because the subdomain it spiders from within one domain will be the eact same content as for the actual domain name submitted.
Can I put a robots.txt in each folder and in the root. In the root which has its own domain name, if i disallow spidering of the folders with their own domain names, will the spiders when crawling the domain names still get to the folders the sites are in, or do they automatically go to the root of the webspace and se the robot.txt there.
The domain names poit directly to the folders, so if I put robots.txt in each folder specifying not to go out the folder and one in the root saying not to go in any of the other domain name folders, would this solve teh problem. I am only worried that I might inadvertently disallow spidering of the other domain name folders by including a disallow in the root of the hostings webspace.
First thing to remember is that the robots have no knowledge of your folder structure, so all paths are relevant to the domain that it is requesting URL's from.
So if you have www.example1.com pointing to a folder called public_html and www.example2.com pointing to a sub folder of public_html called mysubdomain Then in the robots.txt file in public_html have disallow: /mysubdomain
The robots.txt in the public_html will only be used for www.example1.com as only robots.txt in the domains root folder will be used.
If you were to put a robots.txt in the mysubdomain folder for the above example, it would not be picked up for www.example1.com, but would be for www.example2.com. However all disallows would be from the mysubdomain folder down, e.g if you wanted to disallow www.example2.com/page3.htm you would have disallow: /page3.htm