Forum Moderators: phranque
I don't want Google indexes my test subdomain. Where can I put my 'robots.txt'? What's the content of this file?
Thanks in advance,
Daniele
The directories structure in the following:
/var/www/www.example.com/ contains some subdirectories such as xyz_www', 'xyz_test', ....
It appears that your hosting set-up maps your subdomain URLs to subdirectories in this way: xyz_<subdomain_name>.
If that is correct, then robots.txt for each subdomain would go into each of those subdomain-subdirectories.
The syntax of robots.txt is defined by A Standard for Robot Exclusion [robotstxt.org]. Although this standard was never offcially adopted by any "governing body," almost all 'good' (that is to say, non-malicious) spiders honor its basic requirements.
In addition, some search engine robots have defined their own proprietary extensions. These are generally documented in the "Webmaster help" sections of their Web sites.
In order to disallow all spiders from all parts of a subdomain, the robots.txt file contents would be something like:
# Disallow all robots from all URLs
User-agent: *
Disallow: /
As mentioned above, only 'good' robots will fetch and obey robots.txt. If it is critical that these subdomains not be accessed by robots, you may want to take stronger measures, such as using .htaccess to restrict access based on REMOTE_ADDR (for example, restrict access to clients at your IP address or within your intranet's IP adddress range only), or to require a password to access them.
Jim