Forum Moderators: phranque

Message Too Old, No Replies

Rewrite .html to .htm

         

crobb305

3:24 am on Jan 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use .htm extension for my pages. Since Google is trying to access some pages on my site using .html, I guess I need to 301 them to .htm to stop the 404s that are being reported in GWT? Should I bother? If so, can someone help me with the rewrite rule?

Thanks
Chris

g1smd

9:41 am on Jan 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, redirect them, and make sure it is a 301 redirect.

# Setup

RewriteEngine On

# Fix all .html requests to redirect to www and .htm

RewriteRule (.*)\.html$ http://www.example.com/$1.htm [R=301,L]

# Fix non-www requests to redirect to www

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1.htm [R=301,L]

This is about the simplest ever .htaccess file.

I guess you'll need to add lines for

ErrorDocument
and
DirectoryIndex
and
Options -Indexes
settings too. Read the Apache documentation to find out what they do.

crobb305

8:19 pm on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



will this conflict with my existing rule that rewrites index.html to '/' ?

#Rewrite index.html to homepage without index.html
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

AND I already have a non-www to www, but it says nothing about .htm or .html. Do I need to add your rule in addition to the rule below?

# Rewrite non-www to www
RewriteCond %{HTTP_HOST} ^example.(.*)
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

It looks like I need all 4 rules (your two, plus my existing two).

g1smd

2:30 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Take all of the code I supplied above and add this as the first rule of three:

# Redirect index.html or .htm in any directory to root of that directory and force www

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1? [R=301,L]

This rule must redirect to / for both index.html and index.htm requests in order to avoid a redirection chain if index.html is requested.

The three new rules replace all of your code.

g1smd

2:48 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You repeatedly refer to a 'rewrite' above.

Please note that none of these rules are internal rewrites. They are all external redirects.

jdMorgan

3:44 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You only need three of the four:

# Setup
RewriteEngine On
#
# Redirect all .html requests to .htm on canonical host
RewriteRule ^(.*)\.html$ http://www.example.com/$1.htm [R=301,L]
#
# Redirect direct client requests for "index.html" to root URL "/"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.html
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]
#
# Redirect non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Jim

jdMorgan

3:46 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Left browser open too long, and cross-posted as a result... :o

Jim

crobb305

3:50 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you. I know they are external redirects, but I have always said "rewrite" (though technically incorrect) because the code is rewritecond/rewriterule. I will implement your suggestions and I appreciate your help.

Chris

g1smd

8:13 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's a flaw with jd's code.

A request for index.html will be redirected by the first rule to index.htm and the second rule will never be matched for those requests.

Fix that by swapping the order of rule one and two in his code, or use my code as suggested.

crobb305

5:43 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd,

I had already added a rule just above his index.html redirect that further redirects requests for index.htm to canonical:

# Redirect direct client requests for "index.htm" to root URL "/"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.htm
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

followed by

# Redirect direct client requests for "index.html" to root URL "/"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.html
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

So basically, I have 4 rules instead of his 3. Everything seems to be resolving on all variants I have tested.

Also, what happens when using a '+' versus {3,9}?

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.htm Versus
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm

crobb305

5:50 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



also, one thing happening in Windows (IE 7), when I enter the address WITHOUT specifying 'http://www.' (just entering 'example/page.htm'), I get a popup that tells me "windows cannot find the page'. This happens only on internal pages. When I try to access the homepage, Windows tries to download the file to the harddrive. However, when I try it on my Mac (in Safari and Firefox), it works fine. I have to be leaving something out.

g1smd

8:57 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's no need to have two rules, one for index.htm and another for index.html requests.

One rule, with a pattern of

index.html[b]?[/b]
will take care of both requests and be a lot more efficient.

For your other problem, use Live HTTP Headers, check what MIME-type is returned for that content. It is likely incorrect or missing.

The '+' means 'not blank' and the '{3,9}' means 'between three and nine characters'. Mostly personal style I guess.

The index rules must be first, next the more general .html rules, and lastly the general non-www to www rules.