Forum Moderators: phranque

Message Too Old, No Replies

HTTP 1.0 User Agents won't use Host Field

I want to make sure that I'm not missing anything

         

jonrichd

10:02 pm on Apr 18, 2005 (gmt 0)

10+ Year Member



I am trying to solve a problem in which a client has 50 or so domain names picked up over the years that all need to be properly redirected with a 301 Moved Permanently over to the main domain that the client is using.

For some of these domains, there are backlinks to 'deep pages' that I can properly map to a valid page on the main domain. For other domains, if there are backlinks to deep pages, I would just as soon send them to the root of the main domain.

Maindomain.com is on an IIS server, where I can't take advantage of .htaccess.

The plan was to send all these domains to a single IP address, different than that of maindomain.com, and then use the following .htaccess to sort things out:

RewriteEngine on 
# Rewrite some domains to dest site keeping pages
RewriteCond %{HTTP_HOST} ^www\.keeppages1\.com [OR]
RewriteCond %{HTTP_HOST} ^www\.keeppages2\.com [OR]
RewriteCond %{HTTP_HOST} ^www\.keeppages3\.com
RewriteRule (.*) http ://www.maindomain.com/$1 [R=301,L]
#
# Redirect all other domains to site root
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.keeppages1\.com [NC]
RewriteCond %{HTTP_HOST}!^www\.keeppages2\.com [NC]
RewriteCond %{HTTP_HOST}!^www\.keeppages3\.com [NC]
RewriteRule (.*) http ://www.maindomain.com/ [R=301,L]

This code works, but it struck me that Googlebot and other HTTP 1.0 user agents are all going to fall into the second condition, and always get directed to the root of maindomain.com. While this isn't bad, I would like to keep the deep links if I could.

I see two possible answers:

1. Set up separate virtual hosts for each of the domains I would like to keep the pages for and give them an .htacess that would pass through the filenames. (A pain).

2. Use a Redirect Permanent /filename.htm http ://www.maindomain.com/filename.htm prior to any of the URL rewriting in the .htaccess file.

Are either of these two going to work, or am I missing a third possibility?

Thanks in advance.

jdMorgan

10:26 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jonrichd,

A third possibility:

3) Copy the first RewriteCond from the second ruleset into the first ruleset. This will disable both rulesets for true HTTP/1.0 clients which do not send a Host request header, and therefore result in {HTTP_HOST} being blank.

The special handling of HTTP/1.0 clients is really more directed toward preventing errors on your server; Since a true HTTP/1.0 client won't provide a Host header in its request, it cannot access anything but the default server on a name-based virtual hosting server anyway. While many search engines "publish" that they are using HTTP/1.0, they are really not; You can be sure that if they list your name-based, virtually-hosted site, they are sending a Host header, and are therefore capable of handling HTTP/1.1.

Jim

jdMorgan

10:32 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Additionally, you might want to consider responding to the missing-page requests with a 404-Not Found (HTTP/1.0) or 410-Gone (HTTP/1.1) response; By serving a 'friendly' 404/410 page, you inform the visitor that the page is gone, instead of leaving him/her wondering what just happened. You can provide links to your home page and/or to your site map to aid in finding the replacement page, when one exists.

Using loose terms, search engines have shown evidence of 'getting aggravated' when too many URLs all end up at the same page with a 200-OK response. For that reason, I recommend using the HTTP response codes as intended: If a page is gone, then the server response should indicate that.

Jim