homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

disallowing a folder with its own domain name
subdomain being spidered although it has its own domain name

5+ Year Member

Msg#: 765 posted 12:52 pm on Oct 26, 2005 (gmt 0)

I have two websites that are hosted in the same webspace.

Both have their own domain names which point to the correct folder for each.

However yahoo appears to be indexing both sites on the same domain name and I am worried that it will cause a duplicate content issue because the subdomain it spiders from within one domain will be the eact same content as for the actual domain name submitted.

Can I put a robots.txt in each folder and in the root. In the root which has its own domain name, if i disallow spidering of the folders with their own domain names, will the spiders when crawling the domain names still get to the folders the sites are in, or do they automatically go to the root of the webspace and se the robot.txt there.

The domain names poit directly to the folders, so if I put robots.txt in each folder specifying not to go out the folder and one in the root saying not to go in any of the other domain name folders, would this solve teh problem. I am only worried that I might inadvertently disallow spidering of the other domain name folders by including a disallow in the root of the hostings webspace.

Thankyou if you can help at all.




WebmasterWorld Senior Member 5+ Year Member

Msg#: 765 posted 8:52 pm on Oct 26, 2005 (gmt 0)

First thing to remember is that the robots have no knowledge of your folder structure, so all paths are relevant to the domain that it is requesting URL's from.

So if you have
www.example1.com pointing to a folder called public_html and www.example2.com pointing to a sub folder of public_html called mysubdomain
Then in the robots.txt file in public_html have
disallow: /mysubdomain

The robots.txt in the public_html will only be used for www.example1.com as only robots.txt in the domains root folder will be used.

If you were to put a robots.txt in the mysubdomain folder for the above example, it would not be picked up for www.example1.com, but would be for www.example2.com. However all disallows would be from the mysubdomain folder down, e.g if you wanted to disallow www.example2.com/page3.htm you would have
disallow: /page3.htm

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved