homepage Welcome to WebmasterWorld Guest from 54.234.225.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to correctly set robots.txt for a subdomain?
please help
PatrickDeese




msg:1526089
 5:17 pm on Oct 15, 2003 (gmt 0)

I have www.example.com and sub.example.com.

sub.example.com redirects to sub.example.com/sub/ via 301.

I want to make sure that no spiders get into

www.example.com/sub/

or

sub.example.com/other-directory/

the problem is that the robots.txt is at the root. I assume G and other crawlers will ask for the robots.txt at
sub.example.com/robots.txt even though the root for the subdomain is really sub.example.com/sub/

How can I set up a separate robots.txt for the subdomain?

 

jdMorgan




msg:1526090
 5:49 pm on Oct 15, 2003 (gmt 0)

Patrick,

The problem is the 301 redirect from sub.domain.com to sub.domain.com/sub/ . This tells the robot that /sub/ is not a root-level directory.

You might want to use a transparent redirect instead, and place your robots.txt for sub.domain.com in sub.domain.com/sub/robots.txt.

Using a transparent redirect means that the robot will see sub.domain.com as a domain in its own right, separate and distinct from domain.com or www.domain.com. Pages will be indexed in the sub.domain.com domain (sub.domain.com will be in the listed URL). If this is a problem, 301-redirect the pages using .htaccess in the /sub/ folder itself; Robots will then reach that level (that subdirectory) and read robots.txt before they see the redirects for the other pages.

Here's an example .htaccess rewrite:

RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(.*)\.domain\.com [NC]
RewriteRule (.*) /%1/$1 [L]

This is a transparent redirect - a filepath substitution only, and is written for a per-directory .htaccess context; For use in httpd.conf, add a "/" ahead of the RewriteRule pattern.

Jim

PatrickDeese




msg:1526091
 6:20 pm on Oct 15, 2003 (gmt 0)

thanks for answering.
this is what the tech support folks gave me when I requested the subdomain be set up:

--

#!/usr/bin/perl

$url="";

if($ENV{'HTTP_HOST'} eq 'www.sub.example.com')
{$url = "http://sub.example.com/sub/index.html"}

if($ENV{'HTTP_HOST'} eq 'sub.example.com')
{$url = "http://sub.example.com/sub/index.html"}

if($url eq "") {$url="http://www.example.com/index.html"}

#################################################################
print "Location: $url\n";
print "Content-Type: text/html\n\n";
print "<HTML><HEAD><TITLE>$ENV{'HTTP_HOST'}</TITLE></HEAD><BODY>\n";
print "<A HREF=\"$url\">Click here to enter</A>\n";
print "</BODY></HTML>";

--

It seems pretty different from yours. Do you suppose it is compatible?

jdMorgan




msg:1526092
 6:29 pm on Oct 15, 2003 (gmt 0)

Patrick,

My code is for .htaccess on Apache, and yours is for PERL. If you're on Apache, the code I provided would simply replace the PERL script. It would also be processed in "native mode" by Apache, and therefore be more efficient.

You could ask for advice over in the PERL scripting forum if you are not hosted on Apache and can't set up something similar using your control panel (e.g. on an IIS server).

Jim

PatrickDeese




msg:1526093
 6:36 pm on Oct 15, 2003 (gmt 0)

My hosting is definitely Apache, but I think my hosting co. doesn't want people futzing around directly and therefore do all their redirecting etc via perl.

I think I will ask tech support if they can implement your method.

thanks again for your help.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved