Forum Moderators: phranque

Message Too Old, No Replies

Problems with .htaccess 301 Redirects

Redirect 301 in .htaccess not working - Rewrite not working either

         

hawkinsmultimedia

1:33 am on Apr 30, 2009 (gmt 0)

10+ Year Member



Hi,

I recently moved a site from a custom CMS to Expression Engine and am having great difficulties getting redirects to work in .htaccess.

I have about 20 redirects to put in - I have included the first one at the bottom of the .htaccess content. I am still learning my way around .htaccess, and I am sure I am missing something real simple.

Any help would be appreciated!


# secure .htaccess file
<Files .htaccess>
order allow,deny
deny from all
</Files>

# Dont list files in index pages
IndexIgnore *

# EE 404 page for missing pages
ErrorDocument 404 /index.php?/

# Simple 404 for missing files
<FilesMatch "(\.jpe?g¦gif¦png¦bmp)$">
ErrorDocument 404 "File Not Found"
</FilesMatch>

RewriteEngine On

RewriteBase /

# remove the www
RewriteCond %{HTTP_HOST} ^(www\.$) [NC]
RewriteRule ^ http://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

# Add a trailing slash to paths without an extension
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}¦/)$
RewriteRule ^(.*)$ $1/ [L,R=301]

# Remove index.php
# Uses the "include method"
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Include_List_Method
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5})$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} ^/(site¦search¦demo¦news¦includes¦testing¦videos¦scripts¦stuff¦blog¦botw¦about¦privacy¦newsletter¦members¦P[0-9]{2,8}) [NC]
RewriteRule ^(.*)$ /index.php?/$1 [L]

# Remove IE image toolbar
<FilesMatch "\.(html¦htm¦php)$">
Header set imagetoolbar "no"
</FilesMatch>

Redirect 301 /Articles/Daily/805/1/23/2008/The_Man_Who_Saved_the_World_by_Doing_Nothing http://www.example.com/news/article/the-man-who-saved-the-world-by-doing-nothing/

[edited by: jdMorgan at 3:39 am (utc) on April 30, 2009]
[edit reason] Removed, de-linked, & examplified URLs [/edit]

jdMorgan

3:55 am on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As long as the URL-path on the left side of your Redirect directive does indeed match the page you want to redirect, then the only reasons I can think of that your Redirect might not work are because either mod_rewrite is executing first and sending the request off to /index.php (after adding a trailing slash), or because mod_alias is not available to you. The latter would be quite strange, but I suppose it's possible.

Try using a mod_rewrite 301 redirect instead. Insert this line right after the "RewriteBase" line, not at the bottom of the file (directive order and location matters):


RewriteRule ^Articles/Daily/805/1/23/2008/The_Man_Who_Saved_the_World_by_Doing_Nothing(.*)$ http://www.example.com/news/article/the-man-who-saved-the-world-by-doing-nothing/$1 [R=301,L]

As with your original directive, this one should appear all on one line -- any line wrap appearing here is an artifact of the limited-width forum page.

This is an exact functional replacement for your Redirect directive. Note that anything that follows "Doing_Nothing" in the original URL will be copied to the end of the new URL, just as it would be when using a Redirect directive. If you do not need this functionality, then the rule can be simplified.

There are many, many errors in the other code. In fact, the only reason that one error isn't fatal is because a second error prevents the rule from doing anything at all; Otherwise, it would have brought down your server. But try the alternative redirect first, and then we can get on to addressing the other stuff.

Jim

hawkinsmultimedia

4:37 am on Apr 30, 2009 (gmt 0)

10+ Year Member



Thanks Jim,

That did indeed work like a charm. As for the rest of the code - that was generated automatically using the LG .htaccess Generator plugin for Expression Engine, used to remove the index.php segment that follows the domain name in standard expression engine installs.

I am still learning my way around .htaccess and would love to learn more about where the code has problems and what I should do to correct it.

Thanks again,

Jeff

jdMorgan

1:16 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First up, a simple one... Remove the trailing double-quote character from the ErrorDocument message. It's not invalid, but unlike the leading double-quote, this trailing quote isn't needed, and will appear as a literal character when the error page is displayed. (You can test this easily by requesting a non-existent image like "/foo-bar.gif" from your site, and looking at the result in your browser.)

Next, the FilesMatch pattern "(\.jpe?g¦gif¦png¦bmp)$"> is incorrect -- again, it's deficient rather than invalid. It should be "\.(jpe?g¦gif¦png¦bmp)$"> instead, so that the literal period in the pattern applies to all filetypes.

Important: Be sure to change all broken pipe "¦" characters you see in the code here to solid pipe characters before use; Posting on this forum modifies the pipe characters.


# Simple 404 for missing files
<FilesMatch "\.(jpe?g¦gif¦png¦bmp)$">
ErrorDocument 404 "File Not Found
</FilesMatch>

There are several more problems, but I want to address them a few at a time to avoid confusion and to allow me to read some other threads as well... :)

Jim

hawkinsmultimedia

8:51 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



Thanks Jim. I appreciate your help!

jdMorgan

9:32 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Next up, this rule was badly-broken:

# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

And this rule resulted in thousands of unnecessary disk checks, because the RewriteConds were in the wrong order, was also inefficient because [NC] was not taken advantage of to make the character-range compares case-insensitive, and could result in a redirect to a non-canonical hostname because a canonical URL was not specified in the RewriteRule:

# Add a trailing slash to URL-paths without an extension
RewriteCond $1 !(\.[a-z0-9]+¦/)$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ http://%2/$1/ [R=301,L]

Also, the "add trailing slash" rule should precede the "remove www" rule, because it is more specific.

After this, there are at least three more problems... And that is one reason I removed the link to the "htaccess generator" -- It was likely a noble attempt, but it produces fairly awful code.

Jim

[edited by: jdMorgan at 10:12 pm (utc) on April 30, 2009]

hawkinsmultimedia

10:02 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



Not sure what I did wrong, but when I used the code above (correcting the pipe character), it crashed the site. I changed it back to the old code for now and it is working again. Any ideas what I would have done wrong?

Jeff

hawkinsmultimedia

10:13 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



OK - just checking that I am getting this. This set of rules:

# Add a trailing slash to URL-paths without an extension
RewriteCond $1 !(\.[a-z0-9]+¦/)$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ [%2...] [R=301,L]

is a combination of the previous "remove www" and "add trailing slash" - i.e. that block should replace what was previously brocken into two blocks of code?

jdMorgan

10:16 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes,

Possibly this typo (already corrected above to prevent it from spreading).


# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1[b]/%1[/b] [R=301,L]

Should be:

# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1[b]/$1[/b] [R=301,L]

If you still have problems, try using only one of the new rules at a time (along with the old rule for the other function) to narrow down the problem. Also, check your server error log -- some good info in there, usually.

Jim

hawkinsmultimedia

11:35 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



The first rule works and removes the www - however it chokes on the second rule and crashes the site.

jdMorgan

11:46 pm on Apr 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you still keeping in mind my warning from above?
Important: Be sure to change all broken pipe "¦" characters you see in the code here to solid pipe characters before use; Posting on this forum modifies the pipe characters.

The rule will repeatedly add trailing slashes if the correct pipe character isn't used in the RewriteCond pattern. Eventually, either the server or the client will give up and throw an error.

By the way, we cross-posted above, and my "Yes" answer above applied to your previous-to-previous post, and not to the one asking about whether one new rule replaced two old ones.

It does not. The new first rule must also redirect to the correct domain -- because otherwise, you could get two sequential redirects from a link to a non-canonical-and-no-trailing-slash URL, and lose page ranking as a result. Search engines happily pass PageRank/link-popularity through one redirect, but after more than one, don't count on it.

Jim

g1smd

12:22 am on May 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** As for the rest of the code - that was generated automatically using the LG .htaccess Generator plugin for Expression Engine ***

The majority of that code is fairly horrible, but the order of the individual rules means that for certain requests there can be an unwanted two or three step redirection chain.

jdMorgan

12:37 am on May 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As mentioned above, the first two rules in the original code are reversed, and both rules need to specify the correct canonical domain as the redirect target. This is true of the third rule as well, but since each of these rules has problems requiring discussion, I've been taking them one or two directives/rules/concepts at a time...

Right now, we're hung up temporarily -- hopefully on something simple like fixing a broken pipe.

Jim

[edited by: jdMorgan at 12:37 am (utc) on May 1, 2009]

hawkinsmultimedia

3:42 am on May 1, 2009 (gmt 0)

10+ Year Member



Thanks for all your help. The rewrite rules you set me up with first are working like a charm and we are not losing any of the backlinks we have built up which is great. However, using the other rules I just can't get it to play nice with my Expression Engine setup. I double and triple checked the pipes to see if I had missed anything and there was no problem there.

The "remove .www" rules seem to work well - however, once I add the "add trailing slash" rules into the mix, one of two things happens depending on where it is placed in the order of the .htaccess document.

1. It either brings the site down completely (placed after the remove www), or
2. Causes the site to perform like a dog until things eventually stand still (placed before the remove www).

At the moment the site is using a mixture of the original .htaccess plugin along with the 301 rewrite rules provided by Jim and it is working. When I get a bit more time, I would like to go back over Jim's code and the server logs and work out what I was missing. I have no doubt it was something I was messing up!

Thanks again,

Jeff

g1smd

11:52 am on May 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don't leave it in a state of "I think it works" or "it looks like it works" as there are very many things that could be going on 'under the hood' that are silently destroying your search rankings and/or traffic.

jdMorgan

3:02 pm on May 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It would be worthwhile to test with the "Live HTTP Headers" add-on for Firefox/Mozilla to see what transactions are occurring between the browser and the server, and to look at the server error logs.

This problem would likely be trivial to fix, given the information available with those basic tools.

Jim