Forum Moderators: phranque
For some of these domains, there are backlinks to 'deep pages' that I can properly map to a valid page on the main domain. For other domains, if there are backlinks to deep pages, I would just as soon send them to the root of the main domain.
Maindomain.com is on an IIS server, where I can't take advantage of .htaccess.
The plan was to send all these domains to a single IP address, different than that of maindomain.com, and then use the following .htaccess to sort things out:
RewriteEngine on
# Rewrite some domains to dest site keeping pages
RewriteCond %{HTTP_HOST} ^www\.keeppages1\.com [OR]
RewriteCond %{HTTP_HOST} ^www\.keeppages2\.com [OR]
RewriteCond %{HTTP_HOST} ^www\.keeppages3\.com
RewriteRule (.*) http ://www.maindomain.com/$1 [R=301,L]
#
# Redirect all other domains to site root
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.keeppages1\.com [NC]
RewriteCond %{HTTP_HOST}!^www\.keeppages2\.com [NC]
RewriteCond %{HTTP_HOST}!^www\.keeppages3\.com [NC]
RewriteRule (.*) http ://www.maindomain.com/ [R=301,L]
This code works, but it struck me that Googlebot and other HTTP 1.0 user agents are all going to fall into the second condition, and always get directed to the root of maindomain.com. While this isn't bad, I would like to keep the deep links if I could.
I see two possible answers:
1. Set up separate virtual hosts for each of the domains I would like to keep the pages for and give them an .htacess that would pass through the filenames. (A pain).
2. Use a Redirect Permanent /filename.htm http ://www.maindomain.com/filename.htm prior to any of the URL rewriting in the .htaccess file.
Are either of these two going to work, or am I missing a third possibility?
Thanks in advance.
A third possibility:
3) Copy the first RewriteCond from the second ruleset into the first ruleset. This will disable both rulesets for true HTTP/1.0 clients which do not send a Host request header, and therefore result in {HTTP_HOST} being blank.
The special handling of HTTP/1.0 clients is really more directed toward preventing errors on your server; Since a true HTTP/1.0 client won't provide a Host header in its request, it cannot access anything but the default server on a name-based virtual hosting server anyway. While many search engines "publish" that they are using HTTP/1.0, they are really not; You can be sure that if they list your name-based, virtually-hosted site, they are sending a Host header, and are therefore capable of handling HTTP/1.1.
Jim
Using loose terms, search engines have shown evidence of 'getting aggravated' when too many URLs all end up at the same page with a 200-OK response. For that reason, I recommend using the HTTP response codes as intended: If a page is gone, then the server response should indicate that.
Jim