Forum Moderators: phranque

Message Too Old, No Replies

removed file extensions - Now 301 redirect ALL Pages?

301 redirects for old file extensions

         

Traffic_Act

9:59 am on Jan 25, 2010 (gmt 0)



Hi,

I have changed my .htaccess file today to no longer display the .html extentions of my pages. I have changed all links on my site and now all pages and links are free of the .html extension.

Now, of course I will have a duplicate content issue as out there somewhere will be my old .html pages.

DO I need to set up a 301 redirect for each and every page, or is there a quicker way?

Below is the code I added to my .htaccess file to remove the file extensions. Perhaps something can be added to this part of the code?

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

Hope someone can help.

Jen

g1smd

10:26 am on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, you need a 301 redirect before that.

RewriteRule (.+)\.html?$ http://www.example.com/($1) [R=301,L]

You'll need a RewriteCond before the new rule to detect that it was a direct client request, otherwise it will loop.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+html?\ HTTP

The code above will redirect for both

.html
and
.htm
requests.

However you will also need to exclude URLs containing the pattern

google[^\.]+\.html?
from being redirected as that is a valid searchengine verification file URL.

Think carefully about any other such URLs that must also remain as

.html
entities.

jdMorgan

2:51 pm on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The original rule could do with an efficiency tweak and a correction:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(([^/]*/)*[^/.]+)$ /$1.html [L]

Note that the leading slash on the rewriterule substitution path may cause problems on some servers. However, its purpose is to prevent malicious path-injection, so this code should be tested as-is and that slash should only be removed if it causes problems.

Jim

g1smd

4:58 pm on Jan 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Remove the parentheses from my ($1) code.

They should not be there. That's my typo.

Traffic_Act

1:39 am on Jan 26, 2010 (gmt 0)



Thanks for those suggestions, I have just pasted the below code into my .htaccess file.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+html?\ HTTP
RewriteRule (.+)\.html?$ http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

Funnily enough, all but three pages are now redirecting. What would the reason for that be?

I have the same with my code that redirects the non-www version to the www-version. (see code below) All but three pages do not redirect, one of them is the home page which is of course the most important one to redirect.

Does anyone have any suggestions?

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

[edited by: jdMorgan at 1:48 am (utc) on Jan. 26, 2010]
[edit reason] exampe.com [/edit]

jdMorgan

1:49 am on Jan 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is your home page's URL?
What is its filepath?
These are different, and we need to know both.

Rule order may also be coming into play. Order your rules with all external redirects first, in order from most-specific (fewest URLs affected) to least-specific (e.g. domain canonicalization redirect), followed by your internal rewrites, again in order from most- to least-specific.

You may also have an Alias or a ScriptAlias at work here. If so, we need to know that.

AcceptPathInfo and MultiViews can also throw a spanner in the gears...

Jim

Traffic_Act

10:29 am on Jan 27, 2010 (gmt 0)



That is going over my head a bit. Can I send you a private message with all with the info you need?

jdMorgan

3:51 pm on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We cannot provide "free private consulting," so let's keep the discussion here please. Do you have a specific question about what I posted?

Jim

Traffic_Act

2:28 am on Jan 28, 2010 (gmt 0)



HI have changed the order around and it has solved only some of my issues with some of my pages.
I am not familiar with ScriptAlias or MultivIews so not sure how to check for that.
Would it be appropriate to post my entire .htaccess file here, for you to have a look at?

jdMorgan

2:34 pm on Jan 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Aliasing is done at the server configuration level. If the three pages that you still have problems with are 'shared scripts', then aliasing is potentially realted to your trouble.

You can disable MultiViews using
Options -MultiViews
as long as your site does not depend on content-negotiation.
Similarly, you can disable AcceptPathInfo (if you are hosted on Apache 2.x) as long as your scripts don't depend on it using
AcceptPathInfo Off

We prefer to avoid large 'code dumps' here for three reasons. First, we don't do 'review my code' services here, second, the longer a post is, the less likely anyone will read it, and third, if the code contains uniquely-identifying information and reveals a security flaw, you open up your site to attack simply by posting here. We are set up here to discuss Apache configuration and usage in a way that is useful to both current and future readers of the threads, and not to serve as a "help desk."

Again, I suggest that you order your rules with all external redirects first, in order from most-specific (fewest URLs affected) to least-specific (e.g. domain canonicalization redirect), followed by your internal rewrites, again in order from most- to least-specific.

If you're sure you understood that and implemented it correctly, but it still doesn't help, then remove all uniquely-identifying information from your code (change the domain name to "example.com" and modify any specific URL-path names, etc.), remove all unrelated lines of code, and post it.

Jim

Traffic_Act

10:47 pm on Jan 28, 2010 (gmt 0)



Thanks very much for your help. Its very much appreciated.

Changing the order sorted everything out. it was a caching issue in the end that did not show me everything was working properly now.

Thanks again.