I have host based domain name configuration on Iplanet Web server. I have few primary domains and many secondary domains pointing to the same web server.
How could I control and allow just google to just index only primary domains (e.g www.primarydomains.com/robots.txt) and not around 300 odd secondary doamins using obj.conf and robots.txt. I have just one document root directory under which robots.txt reside.
How could I typically achieve below this using obj.conf (iplanet) and robots.txt:
1. All primary domain could access robots.txt_a which will have a rule that allows only google to crawl. 2. All the secondary domains could access robots.txt_b which will have rule which blocks all crawlers.
Remember that if you are adding the robots.txt to already-existing sites, it is not a perfect way to make a site invisible; this is especially true if Google is already aware of links between such sites.
You may need to also consider the 'removal tool' - but that, too can have disadvantages.
I am looking at achieving 1. Primary domains abc.com,cde.com having a common doc root should read /robots.txt_a which has a rule allowing googlebot to crawl. 2. All secondary domains xyz.com (actually an alias of abc.com) to read /robots.txt_b which has a rule to block all crawlers.
This is to achieve consolidation of seach results going forward and to improve search on primary domains.