Forum Moderators: goodroi

Message Too Old, No Replies

Two domains mirror each other

Is it possible to a command to a robots.txt file to apply to just one?

         

Jeremy_H

9:25 pm on Dec 4, 2006 (gmt 0)

10+ Year Member



Hello,

I have two different sub domains that are mirrors of each other.

www.site.com = edit.site.com

I would like limit search engines from crawling the edit sub domain, but if I add a robots.txt to edit.site.com/robots.txt it will also go to www.site.com/robots.txt

Are there any commands I can add that will apply to just the edit sub domain?

Thanks

Quadrille

10:51 pm on Dec 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure I follow you.

edit.site.com/robots.txt and www.site.com/robots.txt are entirely separate, and would not confuse any serious SE.

having said that. you'd do much better all round to close edit.site.com and 301 to www.site.com

Easier, safer, much better for you, your site, your visitors and your serps.

Jeremy_H

8:39 pm on Dec 5, 2006 (gmt 0)

10+ Year Member



I agree Quadrille,

There are better ways to setup the server.

Unfortunately I don't have access to change the setup to this website, but the robots.txt could be changed.

Since the two domains mirror each other, the robots.txt on one domain will mirror the other.

Is there any way robots.txt could be setup to exclude certain subdomains or full paths?

Thanks

whoisgregg

9:08 pm on Dec 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is one way I've dealt with the scenario in the past. It requires the ability to edit your .htaccess and PHP. It also allows different versions for SSL sites.

Please note, I rewrote the PHP code from what I use to make it more easily readable. So, it should be considered untested code (although I did do a basic test).

.htaccess

RewriteRule ^robots\.txt$ robots.php [L] 

robots.php:

<?php

$robots = array(
'https://dev.example.com' => '

User-agent: *
Disallow: /

',
'http://dev.example.com' => '

User-agent: *
Disallow: /

',
'https://www.example.com' => '
User-agent: *
Disallow: /

',
'http://www.example.com' => '
User-agent: *
Disallow:

',
'default' => '## default

User-agent: *
Disallow: /

'
);

// Check for SSL
$s = '';
if ( strtolower($_SERVER['HTTPS']) === 'on' ) { $s = 's'; }

// Concatenate index key
$match = 'http'.$s.'://'.$_SERVER['HTTP_HOST'];

// Set default value
$robotstxt = $robots['default'];

// check for better value
if(isset($robots[$match]) &&!empty($robots[$match])){
$robotstxt = $robots[$match];
}

// get an accurate Last-Modified time
$file_lastmod = getlastmod();
$header_lastmod = gmdate("D, d M Y H:i:s", $file_lastmod);

// send headers
header('Last-Modified: '.$header_lastmod.' GMT');
header('Content-Type: text/plain; charset=UTF-8');

// output the robots.txt
echo "## robots.txt for ".$match."\r\r";
echo $robotstxt;
?>