Forum Moderators: open

Message Too Old, No Replies

Combining exclusion and permanent redirection in .htaccess

A thorny problem

         

Arnett

4:55 am on Sep 8, 2003 (gmt 0)

10+ Year Member



I had a serious problem on my server. When I first got on the web in 98 I started with virtual hosting. My url was:

http://www.my-web-host.com/my-folder/

Over time I set up all of my sites in directories like:

[my-web-host.com...]

If the site performed well then I would license a domain like:

http://www.new-site.com

The new site would basically just be pointed at the same files. I now have three domains that have "grown up" this way. I realized that Google was indexing all of the files both under the virtual path and the domain path. This was causing duplicate content penalties for my domains. The more files I added to the site the worse the PR and ranking seemed to get. Follow this with weeks of brainstorming with my webhost.

What we first decided to do was to disallow the virtual paths at the webhost root level. Then we decided to add permanent redirection to the .htaccess files so that Google would:

1 - Delete the [my-web-host.com...] listings from their index

2 - Follow the permanent redirection directive to the files in [new-site.com...] and change all references to the virtual name files to the domain name.

Logically,this seems to be the airtight solution. It solves the duplicate content problem and also causes one set of listings to be "combined" into one set of listings just referencing the domains.

Will it work as outlined? I'm not an Apache or Google expert. You're the people to ask.

Arnett

3:03 am on Sep 10, 2003 (gmt 0)

10+ Year Member



posting to move topic to top...

jdMorgan

3:39 am on Sep 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What we first decided to do was to disallow the virtual paths at the webhost root level.

Not good - then you can't fix the problem by using .htaccess. The robots need to be able to access the virtual paths in order to receive the redirect.

Then we decided to add permanent redirection to the .htaccess files so that Google would:

1 - Delete the [my-web-host.com...] listings from their index

2 - Follow the permanent redirection directive to the files in [new-site.com...] and change all references to the virtual name files to the domain name.

Logically,this seems to be the airtight solution. It solves the duplicate content problem and also causes one set of listings to be "combined" into one set of listings just referencing the domains.

Will it work as outlined?

First, send e-mail to everyone who links to you and ask them to change their links.
Second change the internal links on your site to point to the discrete domain
Next, make sure that all the search engines pick up on the new links.
(You may have already done all the above - I'm trying to be thorough)
Lastly, add the following code to your .htaccess file in the top-level web root. Note that this will only work if there is an exact correspondence between your folder names and your domain names. If such a correspondence does not exist, you'll have to handle each case individually.


RewriteCond %{HTTP_HOST} ^http://(www\.)?my-web-host\.com [NC]
RewriteRule ^my-folder/([^/]*)/(.*)$ http://www.$1.com/$2 [R=301,L]

This code redirects from
www.my-web-host.com/any-domain-folder-name/any_page to
www.any-domain-folder-name.com/any_page
using a 301-Moved Permanently redirect.
It will work with or without "www." in the requested URL, and it doesn't care if the requested URL is upper or lowercase or mixed.

As long as you have some or all of the pages of the discrete domains already indexed in the search enignes, the above will accomplish what you want. It will tell them to forget about the subdirectory-based domains and use the discrete domains instead.

Basically, it maps your folders to your domains directly, and copies the requested page path, too.

Note that this is almost the exact inverse of what you are doing now when you map your discrete domain names to your subdirectories, either with DNS or with a transparent redirect. For that reason, you may need to change the names of your existing subdirectories slightly to prevent an infinite loop. You will need to do this at the same time you install the code above.

Jim

Arnett

3:53 am on Sep 10, 2003 (gmt 0)

10+ Year Member



Thanks Jim.

I'm not worried about anyone who links to the old urls. Nobody does. If they do then that's just that. Google won't show them if the have a PR4 or less anyway.

My main concern is that I want google to drop the old urls from its index and follow the permanent redirect to the new urls taking whatever pr the old urls had to the new urls. They are the same files anyway. If "duplicate content" penalty means that the PR is split between the two urls I want them combined.