Forum Moderators: goodroi

Message Too Old, No Replies

stop Yahoo's crawler but for specific domains

how to stop yahoo's crawler for specif domains

         

nrueda

5:56 pm on Nov 8, 2005 (gmt 0)



Hi,

We have a few domains hosted in the same server (root and IP). For example: www.MyDomain1.com/index.asp, www.MyDomain2.com/index.asp, www.MyDomain3.com/index.asp, etc

index.asp is exactly the same, we read the URL and based on that we select the text to display.

I want to include ONLY one domain to Yahoo (for instance www.MyDomain2.com). So I need to stop the crawler if they go to the other domains, otherwise Yahoo won’t allow me to include my domain in the program.

How can I accomplish this? Is there a way using the robots.txt to know which domain is visiting and disallow the domains I don't want it to crawl?

Thanks in advance,

Nestor

Dijkgraaf

11:40 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lets assume you are on an Apache web server and you can use PHP the you could try the following

Step 1) Make it parse robots.txt as if it were a PHP file
e.g. add the following to .htacess
AddHandler application/x-httpd-php .txt

Step 2) Detect the HTPP_Host.
$http_host = $_SERVER['HTTP_HOST'];

Step 3) Write a bit of logic that looks at $http_host and either writes
User-agent: Slurp
Disallow:
or
User-agent: Slurp
Disallow: /

P.S. You probably want to actually do steps 2 and 3 in a robots.php file till you get your code 100% before naming it robots.txt and doing step 1)