Forum Moderators: goodroi

Message Too Old, No Replies

Use .htaccess to serve correct robots.txt?

Using htaccess and robots.txt to elimiate file duplication and 404 errors

         

iceman

10:48 pm on Oct 25, 2004 (gmt 0)

10+ Year Member



I have a site hosted on UNIX & Apache. After looking at my log files, I notice Google (in particular) will look for a robots.txt file in the sub-directories under my root domain. That causes a 404 error because a separate robots.txt file does not exist in each sub directory.

The easy answer to "clean up" the errors would be to put a robots.txt file in every directory. That would be a pain in the a$$ because there are a few thousand sub-directories and many of them change each week.

I was thinking about adding a(n) .htaccess file with a RewriteRule that would send a 301 Permanently Moved redirect(to the root level robots.txt file) for Google (or any spider for that matter). Since my root level robots.txt file is simple, I shouldn't have to worry about rule conflicts from the sub-directories.

Based on my preliminary testing, the redirection is working fine.
Example:
Spider requests domain.com/subdirectory/robots.txt, the spider is sent to domain.com/robots.txt

Does anybody have any comments or downsides to going with this idea?

Thanks for any help.

jdMorgan

12:05 am on Oct 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Comment:

You are fixing a problem that should not exist. robots.txt is defined as a single file per domain that resides in the Web root of that domain -- i.e. the "home page" directory.

So, the real job here is to figure out why Googlebot is getting confused, and fix that.

Do you map subdomains to subdirectories or anything like that? -- An error in implementing such a mapping function could confuse the 'bot.

I have never seen Googlebot look for robots.txt in any subdirectory of my sites, and I've been watching since AltaVista was King.

The server headers checker in the WebmasterWorld control panel may be useful to debug this problem.

Jim