Forum Moderators: goodroi
I'm currently getting spiders requesting www.widgets.com/robots.txt, and I'm wondering if this should be redirected to widgets.com/robots.txt, or simply return a 404 not found?
Is there any "best practice" for redirecting a domain?
If you put up a robots.txt for www.widgets.com (using mod_rewrite to serve a different robots.txt in response to requests in that domain, for example) then you can either allow all, which would yield the same results as a 404, or you could disallow all. But if you disallow all, then the robots would not request any resources from www.widgets.com, and so would not 'find out' that they have been redirected. Google and Ask Jeeves would then list those URLs with no title or description, only the URL, and Yahoo would list it using only the link text it found on the link it used to reach www.widgets.com. Other search engines might also give you a URL-only listing, while others would not list pages in that domain at all. AFAIK, Google, Yahoo, and Ask Jeeves will list any URL they find a link to, even if they obey robots.txt and don't actually fetch the page. And the only way out of that is to *allow* them to fetch the page, but put a robots noindex in the HTML head of the page.
Finally, you can simply do as you are doing, and redirect even the robots.txt requests to widget.com. This is the simplest -- and I suspect most common -- method of domain redirection. In this case, the robot requests robots.txt, sees the redirect, and asks for robots.txt again in the www-less domain. This seems to avoid the URL-only listings discussed above as well.
So, it all depends on what you want to do. I've been using this latter 'comprehensive' domain redirect method for years with no ill effect.
Jim