Forum Moderators: phranque

Message Too Old, No Replies

.htaccess to deny subdomain indexing

         

decola

1:01 pm on May 2, 2006 (gmt 0)

10+ Year Member



Hi all!
I manage the domain example.com with Apache2.
There are some third-level domains; among them there is 'test' subdomain.
So test.example.com is just for testing of main domain www.example.com.
The directories structure in the following:
/var/www/www.example.com/ contains some subdirectories such as '#*$!x_www', 'xxxx_test', ....

I don't want Google indexes my test subdomain. Where can I put my 'robots.txt'? What's the content of this file?

Thanks in advance,
Daniele

jdMorgan

3:55 pm on May 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The directories structure in the following:
/var/www/www.example.com/ contains some subdirectories such as xyz_www', 'xyz_test', ....

It appears that your hosting set-up maps your subdomain URLs to subdirectories in this way: xyz_<subdomain_name>.

If that is correct, then robots.txt for each subdomain would go into each of those subdomain-subdirectories.

The syntax of robots.txt is defined by A Standard for Robot Exclusion [robotstxt.org]. Although this standard was never offcially adopted by any "governing body," almost all 'good' (that is to say, non-malicious) spiders honor its basic requirements.

In addition, some search engine robots have defined their own proprietary extensions. These are generally documented in the "Webmaster help" sections of their Web sites.

In order to disallow all spiders from all parts of a subdomain, the robots.txt file contents would be something like:

# Disallow all robots from all URLs
User-agent: *
Disallow: /


Lines starting with "#" are comments. Do not 'feel free' to reformat this file, as the syntax must be precise. For example, note the blank line at the end of the record - it is (or was) required by at least one second-tier robot. Also, robot.txt must be created as a plain-text (ASCII) file.

As mentioned above, only 'good' robots will fetch and obey robots.txt. If it is critical that these subdomains not be accessed by robots, you may want to take stronger measures, such as using .htaccess to restrict access based on REMOTE_ADDR (for example, restrict access to clients at your IP address or within your intranet's IP adddress range only), or to require a password to access them.

Jim

decola

7:09 am on May 3, 2006 (gmt 0)

10+ Year Member



Thanks you so much, Jim!