Forum Moderators: phranque
I recently moved a site from a custom CMS to Expression Engine and am having great difficulties getting redirects to work in .htaccess.
I have about 20 redirects to put in - I have included the first one at the bottom of the .htaccess content. I am still learning my way around .htaccess, and I am sure I am missing something real simple.
Any help would be appreciated!
# secure .htaccess file
<Files .htaccess>
order allow,deny
deny from all
</Files># Dont list files in index pages
IndexIgnore *
# EE 404 page for missing pages
ErrorDocument 404 /index.php?/
# Simple 404 for missing files
<FilesMatch "(\.jpe?g¦gif¦png¦bmp)$">
ErrorDocument 404 "File Not Found"
</FilesMatch>
RewriteEngine On
RewriteBase /
# remove the www
RewriteCond %{HTTP_HOST} ^(www\.$) [NC]
RewriteRule ^ http://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
# Add a trailing slash to paths without an extension
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5}¦/)$
RewriteRule ^(.*)$ $1/ [L,R=301]
# Remove index.php
# Uses the "include method"
# http://expressionengine.com/wiki/Remove_index.php_From_URLs/#Include_List_Method
RewriteCond %{REQUEST_URI} !(\.[a-zA-Z0-9]{1,5})$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} ^/(site¦search¦demo¦news¦includes¦testing¦videos¦scripts¦stuff¦blog¦botw¦about¦privacy¦newsletter¦members¦P[0-9]{2,8}) [NC]
RewriteRule ^(.*)$ /index.php?/$1 [L]
# Remove IE image toolbar
<FilesMatch "\.(html¦htm¦php)$">
Header set imagetoolbar "no"
</FilesMatch>
Redirect 301 /Articles/Daily/805/1/23/2008/The_Man_Who_Saved_the_World_by_Doing_Nothing http://www.example.com/news/article/the-man-who-saved-the-world-by-doing-nothing/
[edited by: jdMorgan at 3:39 am (utc) on April 30, 2009]
[edit reason] Removed, de-linked, & examplified URLs [/edit]
Try using a mod_rewrite 301 redirect instead. Insert this line right after the "RewriteBase" line, not at the bottom of the file (directive order and location matters):
RewriteRule ^Articles/Daily/805/1/23/2008/The_Man_Who_Saved_the_World_by_Doing_Nothing(.*)$ http://www.example.com/news/article/the-man-who-saved-the-world-by-doing-nothing/$1 [R=301,L]
This is an exact functional replacement for your Redirect directive. Note that anything that follows "Doing_Nothing" in the original URL will be copied to the end of the new URL, just as it would be when using a Redirect directive. If you do not need this functionality, then the rule can be simplified.
There are many, many errors in the other code. In fact, the only reason that one error isn't fatal is because a second error prevents the rule from doing anything at all; Otherwise, it would have brought down your server. But try the alternative redirect first, and then we can get on to addressing the other stuff.
Jim
That did indeed work like a charm. As for the rest of the code - that was generated automatically using the LG .htaccess Generator plugin for Expression Engine, used to remove the index.php segment that follows the domain name in standard expression engine installs.
I am still learning my way around .htaccess and would love to learn more about where the code has problems and what I should do to correct it.
Thanks again,
Jeff
Next, the FilesMatch pattern "(\.jpe?g¦gif¦png¦bmp)$"> is incorrect -- again, it's deficient rather than invalid. It should be "\.(jpe?g¦gif¦png¦bmp)$"> instead, so that the literal period in the pattern applies to all filetypes.
Important: Be sure to change all broken pipe "¦" characters you see in the code here to solid pipe characters before use; Posting on this forum modifies the pipe characters.
# Simple 404 for missing files
<FilesMatch "\.(jpe?g¦gif¦png¦bmp)$">
ErrorDocument 404 "File Not Found
</FilesMatch>
Jim
# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
# Add a trailing slash to URL-paths without an extension
RewriteCond $1 !(\.[a-z0-9]+¦/)$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ http://%2/$1/ [R=301,L]
After this, there are at least three more problems... And that is one reason I removed the link to the "htaccess generator" -- It was likely a noble attempt, but it produces fairly awful code.
Jim
[edited by: jdMorgan at 10:12 pm (utc) on April 30, 2009]
# Add a trailing slash to URL-paths without an extension
RewriteCond $1 !(\.[a-z0-9]+¦/)$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ [%2...] [R=301,L]
is a combination of the previous "remove www" and "add trailing slash" - i.e. that block should replace what was previously brocken into two blocks of code?
Possibly this typo (already corrected above to prevent it from spreading).
# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1[b]/%1[/b] [R=301,L]
# Externally redirect to remove "www."
RewriteCond %{HTTP_HOST} ^www\.([^.:]+(\.[^.:]+)+)\.?(:[0-9]+)?$ [NC]
RewriteRule ^(.*)$ http://%1[b]/$1[/b] [R=301,L]
Jim
Important: Be sure to change all broken pipe "¦" characters you see in the code here to solid pipe characters before use; Posting on this forum modifies the pipe characters.
By the way, we cross-posted above, and my "Yes" answer above applied to your previous-to-previous post, and not to the one asking about whether one new rule replaced two old ones.
It does not. The new first rule must also redirect to the correct domain -- because otherwise, you could get two sequential redirects from a link to a non-canonical-and-no-trailing-slash URL, and lose page ranking as a result. Search engines happily pass PageRank/link-popularity through one redirect, but after more than one, don't count on it.
Jim
The majority of that code is fairly horrible, but the order of the individual rules means that for certain requests there can be an unwanted two or three step redirection chain.
Right now, we're hung up temporarily -- hopefully on something simple like fixing a broken pipe.
Jim
[edited by: jdMorgan at 12:37 am (utc) on May 1, 2009]
The "remove .www" rules seem to work well - however, once I add the "add trailing slash" rules into the mix, one of two things happens depending on where it is placed in the order of the .htaccess document.
1. It either brings the site down completely (placed after the remove www), or
2. Causes the site to perform like a dog until things eventually stand still (placed before the remove www).
At the moment the site is using a mixture of the original .htaccess plugin along with the 301 rewrite rules provided by Jim and it is working. When I get a bit more time, I would like to go back over Jim's code and the server logs and work out what I was missing. I have no doubt it was something I was messing up!
Thanks again,
Jeff
This problem would likely be trivial to fix, given the information available with those basic tools.
Jim