homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How do I block a sub-domain?
How do I block a dub-domain if the sub-domain is a virtual directory?
shaunm



 
Msg#: 4480532 posted 12:26 pm on Jul 31, 2012 (gmt 0)

Hi,

Can I block a page in my sub-domain from the robots.txt file which is in the root of my main domain?

Example:
abc.example.com/preview.aspx

How do I block the above page/url from the root level?


This sub-domain is a virtual directory. If I am going to add anything it will only be on the root directory of my main domain.

Please advice with examples.


Thanks a lot!

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4480532 posted 1:55 pm on Jul 31, 2012 (gmt 0)

Place the new robots.txt in your file system such that it will be accessible when
subdomain.example.com/robots.txt is requested. This is the only way to do it.
phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4480532 posted 1:14 am on Aug 14, 2012 (gmt 0)

http://developers.google.com/webmasters/control-crawl-index/docs/robots_txt [developers.google.com]:
The robots.txt file must be in the top-level directory of the host, accessible though the appropriate protocol and port number.


It is not valid for other subdomains, protocols or port numbers. It is valid for all files in all subdirectories on the same host, protocol and port number.

shaunm



 
Msg#: 4480532 posted 7:51 am on Aug 16, 2012 (gmt 0)

@phranque

Thank you so much for that, mate!

Best,

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4480532 posted 8:06 am on Aug 17, 2012 (gmt 0)

i forgot to include the reference from the actual Robots Exclusion Protocol - http://www.robotstxt.org/robotstxt.html [robotstxt.org]:
Where to put it
The short answer: in the top-level directory of your web server.
The longer answer: When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place. For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt". So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

shaunm



 
Msg#: 4480532 posted 10:38 am on Aug 17, 2012 (gmt 0)

@phranque

Once again, thanks :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved