Forum Moderators: goodroi

Message Too Old, No Replies

301 Redirect domain?

How to handle robots.txt?

         

Mr_Roberto

4:07 am on Aug 23, 2004 (gmt 0)

10+ Year Member



I have a site widgets.com, and I currently do a 301 redirect from www.widgets.com -> widgets.com (as my main site does not use the www prefix).

I'm currently getting spiders requesting www.widgets.com/robots.txt, and I'm wondering if this should be redirected to widgets.com/robots.txt, or simply return a 404 not found?

Is there any "best practice" for redirecting a domain?

Tsuren

4:21 am on Aug 23, 2004 (gmt 0)

10+ Year Member



It depends. What are you doing with widgets.com/somepage.html?

By the way, 301 is not the best decision.

Mr_Roberto

6:05 pm on Aug 23, 2004 (gmt 0)

10+ Year Member



I simply don't want any "www" variants to show up in search results. 301 (permanent redirect) should allow pagerank to be transferred for any sites that mistakenly link to www.widgets.com rather than widgets.com. So I'm just wondering how to handle robots.txt, etc?

DaveAtIFG

12:49 am on Aug 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suspect your redirect may be faulty. A request to www.example.com/robots.txt should forward to example.com/robots.txt as should a request for any other "site resource."

jdMorgan

1:13 am on Aug 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Analysis: If you return a 404-Not Found in response to requests for www.widgets.com/robots.txt, then robots will feel free to spider all resources in www.widgets.com. But there aren't any, since you've redirected them all to widgets.com.

If you put up a robots.txt for www.widgets.com (using mod_rewrite to serve a different robots.txt in response to requests in that domain, for example) then you can either allow all, which would yield the same results as a 404, or you could disallow all. But if you disallow all, then the robots would not request any resources from www.widgets.com, and so would not 'find out' that they have been redirected. Google and Ask Jeeves would then list those URLs with no title or description, only the URL, and Yahoo would list it using only the link text it found on the link it used to reach www.widgets.com. Other search engines might also give you a URL-only listing, while others would not list pages in that domain at all. AFAIK, Google, Yahoo, and Ask Jeeves will list any URL they find a link to, even if they obey robots.txt and don't actually fetch the page. And the only way out of that is to *allow* them to fetch the page, but put a robots noindex in the HTML head of the page.

Finally, you can simply do as you are doing, and redirect even the robots.txt requests to widget.com. This is the simplest -- and I suspect most common -- method of domain redirection. In this case, the robot requests robots.txt, sees the redirect, and asks for robots.txt again in the www-less domain. This seems to avoid the URL-only listings discussed above as well.

So, it all depends on what you want to do. I've been using this latter 'comprehensive' domain redirect method for years with no ill effect.

Jim