Forum Moderators: phranque
I have changed my .htaccess file today to no longer display the .html extentions of my pages. I have changed all links on my site and now all pages and links are free of the .html extension.
Now, of course I will have a duplicate content issue as out there somewhere will be my old .html pages.
DO I need to set up a 301 redirect for each and every page, or is there a quicker way?
Below is the code I added to my .htaccess file to remove the file extensions. Perhaps something can be added to this part of the code?
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Hope someone can help.
Jen
RewriteRule (.+)\.html?$ http://www.example.com/($1) [R=301,L] You'll need a RewriteCond before the new rule to detect that it was a direct client request, otherwise it will loop.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+html?\ HTTP The code above will redirect for both
.html and .htm requests. However you will also need to exclude URLs containing the pattern
google[^\.]+\.html? from being redirected as that is a valid searchengine verification file URL. Think carefully about any other such URLs that must also remain as
.html entities.
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(([^/]*/)*[^/.]+)$ /$1.html [L]
Jim
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+html?\ HTTP
RewriteRule (.+)\.html?$ http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Funnily enough, all but three pages are now redirecting. What would the reason for that be?
I have the same with my code that redirects the non-www version to the www-version. (see code below) All but three pages do not redirect, one of them is the home page which is of course the most important one to redirect.
Does anyone have any suggestions?
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
[edited by: jdMorgan at 1:48 am (utc) on Jan. 26, 2010]
[edit reason] exampe.com [/edit]
Rule order may also be coming into play. Order your rules with all external redirects first, in order from most-specific (fewest URLs affected) to least-specific (e.g. domain canonicalization redirect), followed by your internal rewrites, again in order from most- to least-specific.
You may also have an Alias or a ScriptAlias at work here. If so, we need to know that.
AcceptPathInfo and MultiViews can also throw a spanner in the gears...
Jim
You can disable MultiViews using
Options -MultiViews
as long as your site does not depend on content-negotiation.
Similarly, you can disable AcceptPathInfo (if you are hosted on Apache 2.x) as long as your scripts don't depend on it using
AcceptPathInfo Off
We prefer to avoid large 'code dumps' here for three reasons. First, we don't do 'review my code' services here, second, the longer a post is, the less likely anyone will read it, and third, if the code contains uniquely-identifying information and reveals a security flaw, you open up your site to attack simply by posting here. We are set up here to discuss Apache configuration and usage in a way that is useful to both current and future readers of the threads, and not to serve as a "help desk."
Again, I suggest that you order your rules with all external redirects first, in order from most-specific (fewest URLs affected) to least-specific (e.g. domain canonicalization redirect), followed by your internal rewrites, again in order from most- to least-specific.
If you're sure you understood that and implemented it correctly, but it still doesn't help, then remove all uniquely-identifying information from your code (change the domain name to "example.com" and modify any specific URL-path names, etc.), remove all unrelated lines of code, and post it.
Jim
Changing the order sorted everything out. it was a caching issue in the end that did not show me everything was working properly now.
Thanks again.