How to .301 redirect the following to the root?

Forum Moderators: phranque

Message Too Old, No Replies

How to .301 redirect the following to the root?

JackR

11:50 am on Sep 11, 2008 (gmt 0)

Hi folks,

Tell me, what do I need to add to the .htaccess file to .301 redirect:

http://example.com
http://example.com/index.html
http://example.com/index.html
http://www.example.com/index.htm
http://www.example.com/index.html

To:

http://www.example.com

Thank you.

g1smd

11:54 am on Sep 11, 2008 (gmt 0)

That code gets posted here almost every day.

Check recent threads to get you started.

Post your best effort here if you still can't get it to work.

JackR

12:00 pm on Sep 11, 2008 (gmt 0)

Thanks g1smd,

Here's what I have:

# Redirect the non www version to the www version
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/ [L,R=301]

- The problem is that it doesn't work for http://www.example.com/index.htm

g1smd

12:06 pm on Sep 11, 2008 (gmt 0)

You need that one too. That one makes sure that all requests for non-www URL are redirected to the www equivalent.

You also need the one for index files. That's another chunk of code, and it goes before the other redirect.

JackR

12:21 pm on Sep 11, 2008 (gmt 0)

Do you mean:

redirectMatch 301 ^http://www.example.com/index.htm$ /

JackR

12:24 pm on Sep 11, 2008 (gmt 0)

Or:

RewriteCond %{the_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ [%{HTTP_HOST}...] [R=301,L]

RewriteCond %{the_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.htm$ [%{HTTP_HOST}...] [R=301,L]

- I have no previous experience of using .htaccess before today!

g1smd

12:25 pm on Sep 11, 2008 (gmt 0)

No. Never match directives from Mod_Alias and Mod_Rewrite otherwise you can't guarantee the order in which they are processed.

You will need one with RewriteRule again.

g1smd

12:26 pm on Sep 11, 2008 (gmt 0)

Your second example is better but it has two big problems.

1. It doesn't work for index files in folders. It only works for the root.

2. It doesn't fix the domain to be www. So, in combination with your other rule that does, it will create a redirection chain. This rule should fix the URL to be www as well as dealing with the removal of the index file filename at the same time.

The code you need is in a thread active yesterday and today, here in this forum.

JackR

12:51 pm on Sep 11, 2008 (gmt 0)

Thanks g1smd,

Can you please point me in the right direction.

I'm not at all familiar with Apache and Mod_Rewrite is as clear as Medieval Icelandic Poetry!

g1smd

1:04 pm on Sep 11, 2008 (gmt 0)

Try this one:

[webmasterworld.com...]

There are many others too.

JackR

1:15 pm on Sep 11, 2008 (gmt 0)

Will the following work for both .html and .htm, or do I need to use two instances?

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html$ [domain.com...] [R=301,L]

jdMorgan

1:36 pm on Sep 11, 2008 (gmt 0)

No, it won't work for both. But if you add a "?" after the "l" in "html" *in both lines* then it should work.

Jim

JackR

1:41 pm on Sep 11, 2008 (gmt 0)

Thank you jd,

It still doesn't want to work!

Here's the current .htaccess file:

RewriteEngine on
RewriteOptions MaxRedirects=40
RewriteBase /
RewriteCond %{QUERY_STRING} ^a=j$
RewriteRule ^(.*)$ /joinus.htm? [L,R=301]
RewriteCond %{QUERY_STRING} ^a=g$
RewriteRule ^(.*)$ /example.html? [L,R=301]
RewriteRule ^(.+\.html?$) /cgi-bin/example.cgi [NC,L]
RewriteRule ^sitemap.xml$ /cgi-bin/example.cgi?a=sX [NC,L]
RedirectMatch 301 /links/(.*) http://www.example.com/notfound.shtml
Redirect gone /$myEmail
Redirect gone /&usg=ALkJrhgz6MRDcFkp-kcCoKgNS9ERG-CLtQ
### Redirect the non www version to the www version ###
Options +FollowSymLinks
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/ [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.example.com/$1? [R=301,L]

Any idea what I'm missing here?

jdMorgan

2:24 pm on Sep 11, 2008 (gmt 0)

Several problems:

You are mixing mod_alias and mod_rewrite redirection directives, so you cannot control their execution order. If you need to use mod_rewrite for anything, then you should use it for everything, in order to ensure that your directives are applied in order. Otherwise, the server may execute all mod_alias directives first, followed by all mod_rewrite directives. Or it may execute them in the reverse order. Apache config-file execution is per-module, not per-code-line.

Your rules are in the wrong order: Put your most-specific redirects first, followed by less-specific redirects (your domain canonicalization redirect should generally be the last redirect). Then follow with your internal rewrites, again from most-specific to least-specific. There are some exceptions for missing or removed URLs -- See code below.

Options +FollowSymLinks, if needed, would have to be at the top, before RewriteEngine on. So it's likely not needed on your server.

RewriteBase / is the default setting, and need not be explicitly declared unless you have previously set a non-default value, and need to set it back.

You create back-references in several rules, but then do not use them. This is inefficient.

Literal periods in RewriteCond and RewriteRule patterns should be escaped (precede them with a "\")

MaxRedirects=40 is far too many redirects. I'd suggest 3.

In all, it seems you may be copying and pasting code without fully understanding it. Be warned that since this is server configuration code, that is a recipe for disaster.

Did you completely flush your browser cache before testing?


Options +FollowSymLinks
RewriteEngine on
RewriteOptions MaxRedirects=3
# RewriteBase /
#
# Return 410-Gone for myEmail URLs
RewriteRule myEmail - [G]
#
# Return 410-Gone for specific query string
RewriteCond %{QUERY_STRING} &usg=ALkJrhgz6MRDcFkp-kcCoKgNS9ERG-CLtQ
RewriteRule .* - [G]
#
# Internally rewrite links URLs to non-existent path to force a 404-Not Found response
RewriteRule ^links/ /some-path-that-does-not-exist [L]
#
# Externally redirect request with specific query strings
RewriteCond %{QUERY_STRING} ^a=j$
RewriteRule .* /joinus.htm? [R=301,L]
RewriteCond %{QUERY_STRING} ^a=g$
RewriteRule .* /example.html? [R=301,L]
#
# Externally redirect direct client requests for "<any-directory>/index.html" and
# "<any-directory>/index.htm" to "<any-directory>/"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.example.com/$1? [R=301,L]
#
# Internally rewrite specific URLs to example.cgi
RewriteRule ^([^/]*/)*[^.]+\.html?$ /cgi-bin/example.cgi [L]
RewriteRule ^sitemap\.xml$ /cgi-bin/example.cgi?a=sX [L]
#
# Externally redirect the non www hostname to the www hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Externally redirect to fix up FQDN and appended port numbers
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.�:[0-9]*) [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

I'm not sure if that "myEmail" rule was intended to look at the URL or the query string. In either case, it wasn't coded properly, but take my rule as an example only.

In several cases, you've got internal rewrites that accept variant URLs -- for example, the "htm" and "html" variations. Be advised that best practices indicate that you should canonicalize these variant URLs before rewriting them. In other words, if possible, externally redirect all .htm URLs to .html URLs, and then internally rewrite only .html URLs. This will avoid the so-called "duplicate-content penalties" that result when a single resource (e.g. file) can be reached by more than one URL.

Replace any and all broken pipe "�" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

JackR

3:00 pm on Sep 11, 2008 (gmt 0)

You've let me speechless Jim - a rarity for me!

Thank you for the above, it works perfectly.

Can I clarify one thing:

(\.�:[0-9]*)

Do I need to modify the above in some way? If so, to what?

I have a second question too: the site in question uses a perl CMS to spit out .htm and .html pages.

I've noticed that if you enter www.example.com/example, the correct .404 is returned.

However, if that is changed to either www.example.com/example.htm or www.example.com/example.html, a .200 is returned.

Is this something that can be addressed in the .htaccess file?

g1smd

3:45 pm on Sep 11, 2008 (gmt 0)

jd. Can you explain why the two rewrites noted after:

# Internally rewrite specific URLs to example.cgi

aren't the very last thing in your example?

I would expect them to be, as they are rewrites.

as per, General Rule: rewrites go after all the redirects.

jdMorgan

3:54 pm on Sep 11, 2008 (gmt 0)

Can I clarify one thing:
(\.�:[0-9]*)
Do I need to modify the above in some way? If so, to what?

From my post above:

Replace any and all broken pipe "�" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

I've noticed that if you enter www.example.com/example, the correct .404 is returned.
However, if that is changed to either www.example.com/example.htm or www.example.com/example.html, a .200 is returned.

We need more details here: What specifically is wrong with the www.example.com/example.html URL? Does it not exist as a "physically-existing" (static) file, or is there no data in the CMS to generate a page for that URL?

If the latter is true, then the CMS must be modified so that it handles this condition by returning a 404-Not Found or 410-Gone response -- 404 if the data to generate the page never existed, and 410 if it once did exist but has been obsoleted or removed.

An opportunity also exists here for further improvement: If the page once did exist but you intentionally obsoleted it, then the CMS database could support an entry that contains a replacement URL for use when the page has been obsoleted, but you now have another page that addresses the same or a very similar subject. Adding this kind of meta-data to your database allows centralized administration and provides really good support for bogus, obsoleted, or replaced resources.

What you're likely talking about here is a fairly common problem: On script-based sites where almost all URL requests are handled by the script, it is up to the script to take over 404/410/301 response handling for any and all URLs that it accepts for handling.

It's fairly common that this is not properly implemented, even on "professionally-written" CMS packages and shopping carts. This results in potentially-massive duplicate content problems and "shallow" spidering; Once a search engine discovers that your site returns a 200-OK response for just about *any* requested URL, it will artificially limit its spidering depth to avoid your "infinite URL-space" and this can result in sparse/stale search results listings and less-than-optimal search results ranking for your site(s). It is, in short, a very serious problem.

Jim

JackR

4:07 pm on Sep 11, 2008 (gmt 0)

Thanks again Jim - I missed the qualification vis-a-vis the Solid Pipe.

Let me clarify my post above:

www.example.com/example.html does not exist - either physically or in the CMS - but a .200 is returned, not a (correct) .410.

Both the duplicate content and shallow spidering are problems I'm trying to resolve right now.

The CMS was custom written a while back and my current web guy is still not too familiar with it.

Short of commissioning a new CMS, is there way to resolve the response handling issue via .htaccess, or am I out of luck on this one?

jdMorgan

5:01 pm on Sep 11, 2008 (gmt 0)

You will have to modify/correct the CMS, I'm afraid. Evaluate the ROI snd then proceed as indicated.

It should not be "too big of a deal" to a programmer familiar with CMSes, databases, and sending server response headers. Find the spot in the CMS code where you can decide if "the requested db entry exists." If it exists, proceed as normal. If not, release the db connection, send an error response to the client, and exit.

Jim

JackR

5:31 pm on Sep 11, 2008 (gmt 0)

Thanks Jim.

You have a unique ability to make complicated tasks appear effortless!

jdMorgan

5:53 pm on Sep 11, 2008 (gmt 0)

The one thing that I wish more members understood is this: If the problem and/or goal are well-defined, then the actual implementation (coding) is not usually difficult. It is defining the problem --and expressing it precisely-- that is the major challenge.

Many threads here reflect a rush past requirements specification to coding, and this often leads to the correct answer to the wrong problem; For example, code is sometimes posted that will work, but that will implement a function that should not be done for other reasons, such as search engine ranking or usability.

One of my favorite cartoons from a previous career showed the lead programmer heading upstairs from the "software engineering room." The caption read, "I'll go upstairs and find out what they want. The rest of you start coding!" :)

Jim

g1smd

9:14 pm on Sep 11, 2008 (gmt 0)

I think most people are completely daunted by Mod_Rewrite and .htaccess the first few times they see it.

As they have no idea what they are doing, most people just jump in and try stuff in the hope that it will work.

I see many things that appear to work (done this myself, several times) but which will cause major problems with other unexpected URL inputs that weren't tested when implemented.

I have learnt to try a huge range of both expected and unexpected URLs when testing code out, but still manage to miss some on occasions.

I have recently revisited some of my old code from a few years ago and spotted some howlers. I have rewritten much of it to be more efficient, and expect to revisit it again later on.