Allow google to crawl on only primary domains

Forum Moderators: goodroi

Message Too Old, No Replies

Allow google to crawl on only primary domains

How to achieve exclusion on multiple domains using obj.conf & robots.txt

justlonging1

11:45 am on May 30, 2007 (gmt 0)

I have host based domain name configuration on Iplanet Web server. I have few primary domains and many secondary domains pointing to the same web server.

How could I control and allow just google to just index only primary domains (e.g www.primarydomains.com/robots.txt) and not around 300 odd secondary doamins using obj.conf and robots.txt. I have just one document root directory under which robots.txt reside.

How could I typically achieve below this using obj.conf (iplanet) and robots.txt:

1. All primary domain could access robots.txt_a which will have a rule that allows only google to crawl.
2. All the secondary domains could access robots.txt_b which will have rule which blocks all crawlers.

Quadrille

12:06 pm on May 30, 2007 (gmt 0)

Remember that if you are adding the robots.txt to already-existing sites, it is not a perfect way to make a site invisible; this is especially true if Google is already aware of links between such sites.

You may need to also consider the 'removal tool' - but that, too can have disadvantages.

What exactly are you trying to achieve?

justlonging1

12:35 pm on May 30, 2007 (gmt 0)

I am looking at achieving
1. Primary domains abc.com,cde.com having a common doc root should read /robots.txt_a which has a rule allowing googlebot to crawl.
2. All secondary domains xyz.com (actually an alias of abc.com) to read /robots.txt_b which has a rule to block all crawlers.

This is to achieve consolidation of seach results going forward and to improve search on primary domains.