Forum Moderators: goodroi
The easy answer to "clean up" the errors would be to put a robots.txt file in every directory. That would be a pain in the a$$ because there are a few thousand sub-directories and many of them change each week.
I was thinking about adding a(n) .htaccess file with a RewriteRule that would send a 301 Permanently Moved redirect(to the root level robots.txt file) for Google (or any spider for that matter). Since my root level robots.txt file is simple, I shouldn't have to worry about rule conflicts from the sub-directories.
Based on my preliminary testing, the redirection is working fine.
Example:
Spider requests domain.com/subdirectory/robots.txt, the spider is sent to domain.com/robots.txt
Does anybody have any comments or downsides to going with this idea?
Thanks for any help.
You are fixing a problem that should not exist. robots.txt is defined as a single file per domain that resides in the Web root of that domain -- i.e. the "home page" directory.
So, the real job here is to figure out why Googlebot is getting confused, and fix that.
Do you map subdomains to subdirectories or anything like that? -- An error in implementing such a mapping function could confuse the 'bot.
I have never seen Googlebot look for robots.txt in any subdirectory of my sites, and I've been watching since AltaVista was King.
The server headers checker in the WebmasterWorld control panel may be useful to debug this problem.
Jim