Forum Moderators: phranque
RewriteEngine On
RewriteBase /
#1 - Redirect requests for old URLs to new URLs
RewriteRule ^old-page\.html?$ http://www.example.com/new-folder/new-page [R=301,L]
# Then repeat the above 80 times.
#2 - Redirect index.html or .htm in any directory to root of that directory and force www
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1? [R=301,L]
#3 - Redirect all .html requests to .htm on canonical host.
RewriteRule ^([^.]+)\.html$ http://www.example.com/$1.htm [R=301,L]
#4 - Redirect direct client request for old URL with .htm extension
# to new extensionless URL if the .htm file exists
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.htm\ HTTP/
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(([^/]+/)*[^.]+)\.htm$ http://www.example.com/$1 [R=301,L]
#5 - Redirect any request for a URL with a trailing slash to extensionless URL
# without a trailing slash unless it is a request for an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)/$ http://www.example.com/$1 [R=301,L]
#6 - Redirect requests for non-www/ftp/mail subdomain to www subdomain.
RewriteCond %{HTTP_HOST} !^(www|ftp|mail)\.example\.com$
RewriteRule ^([^.]+)$ http://www.example.com/$1 [R=301,L]
#7 - Internally rewrite extensionless URL request
# to .htm file if the .htm file exists
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
[^\ ]* that allows for appended trailing junk or parameters after the index filename. I would alter the Rule pattern to allow index requests with trailing junk to also redirect and for the junk to be stripped. (.*) to something more specific and you can get rid of at least the second Condition. Should this rule also strip parameters in the redirect if they were requested? -f test is a good idea or not. Valid and non-valid requests trigger -f to look at the filesystem to see if the file exists. Valid requests then look at the filesystem a second time to fetch that file. The two filesystem accesses make valid requests slightly slower. If the Condition were removed, all requests would look at the filesystem only once, and the file would either be served or Apache would generate a 404 error to say it didn't exist. There's a difference in the error message though. With the -f test present the error would say that "/this-stuff" does not exist, but without the -f test the error would be that "/this-stuff.htm" does not exist, exposing that you're using rewrites to static .htm files. Rule 1: I would remove the closing $ so that old .htm URLs whether requested as .htm or .html and with or without appended junk also redirect. Should this rule also strip parameters if they were requested? The Apache default action is to re-append them. Removing parameters is as simple as adding a question mark to the rule target.
#1 Redirect requests for old URLs to new URLs
RewriteRule ^old-page\.htm http://www.example.com/new-folder/new-page? [R=301,L]
# Then repeat the above 80 times.
Rule 2: The Condition has [^\ ]* that allows for appended trailing junk or parameters after the index filename. I would alter the Rule pattern to allow index requests with trailing junk to also redirect and for the junk to be stripped.
#2 Redirect index requests in any directory to root of that directory, removing trailing parameters, forcing 'http://www.'
RewriteRule ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ http://www.example.com/$1? [NC,R=301,L]
Rule 8: I would allow URLs with trailing junk to also be redirected to the new URL. I think I would also strip parameters in the redirect.
Looking at this rule
RewriteRule ^(([^/]+/)*[^.]+)\.html?$
I realized there's yet another possible malformed request:
example.com/blahblah//.html
So I guess the second grouping bracket needs to be [^./] after all. I don't know whether the server interprets // in this location as a null file-- error of some sort, surely? --or as a file called "/.html" In the specific case of .htm or .html you're in the clear because the server has already blocked requests beginning in .ht (I looked in MAMP's config file; that's the wording).
#8 Redirect remaining .htm or .html requests to extensionless URL, removing trailing parameters, forcing 'http://www.'
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+\.html?[^\ ]*\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*[^.]+)\.html?[^\ ]*$ http://www.example.com/$1? [NC,R=301,L]
Rule 5: Should this rule also strip parameters in the redirect if they were requested?
I don't know whether
$1
is more efficient than
%{REQUEST_URI}
I'd go with the longer form unless there's a big difference in server efficiency, just so I don't have to keep looking back "What $1? Which rule is this again?"
#5 Redirect requests with trailing slash to extensionless URL, forcing 'http://www.', if request is a valid .htm file, excluding specific folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+/\ HTTP/
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^.]+)/$ http://www.example.com/$1 [R=301,L]
#11 Redirect requests with trailing invalid characters to extensionless URL, removing trailing parameters, forcing 'http://www.', excluding specific file types and folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*[\w\-/]*)?[^\w\-/\ ]+[^\ ]*\ HTTP/
RewriteCond %{REQUEST_URI} !\.(css|gif|jpe?g|png|js|ico|xml|txt)$ [NC]
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule ^((([^/]+/)*[\w\-/]*)?)[^\w\-/\ ]+[^\ ]*$ http://www.example.com/$1? [R=301,L]
#9 Redirect requests with trailing query string to extensionless URL, removing trailing parameters, forcing 'http://www.', excluding specific folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Can't Rule 13 be expressed as ^([^.]*)$ so you don't have to put all those non-page extensions in a Condition? At this point you've already redirected all requests for .htm/.html
Rule 13: Should this rule also strip parameters in the redirect if they were requested?
#13 Redirect https: requests to 'http://www.' if request is a valid .htm file or directory, excluding specific folders and file
RewriteCond %{SERVER_PORT} ^443$
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond $1 !^file1
RewriteCond %{REQUEST_FILENAME}.htm -f [NC,OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^([^.]*)[^\ ]*$ http://www.example.com/$1? [R=301,L]
Rule 6: I think the second Condition is redundant. Stripping parameters in this redirect may cause problems elsewhere without a lot of messing about. I'd put up with a redirection chain for some requests, as you have it now.
#6a Redirect https: requests for non-www and non-webmail subdomains to 'https://www.' if request is a valid non-.htm file, excluding specific folders
RewriteCond %{HTTP_HOST} !^(www|webmail)\.example\.com$ [NC]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule (.*) [example.com...] [R=301,L]
#6b Redirect http: requests for non-www and non-webmail subdomains to 'http://www.' if request is a valid .htm file, non-.htm file, or directory
RewriteCond %{HTTP_HOST} !^(www|webmail)\.example\.com$ [NC]
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{REQUEST_FILENAME}.htm -f [NC,OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
Rule 7: I'm not sure whether the -f test is a good idea or not…With the -f test present the error would say that "/this-stuff" does not exist, but without the -f test the error would be that "/this-stuff.htm" does not exist, exposing that you're using rewrites to static .htm files.
#7 Internally rewrite extensionless URL requests to .htm file
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^.]+[^./]\ HTTP/
RewriteRule ^([^.]+[^./])$ /$1.htm [L]
At some point, you'll renumber your blocks of rules. The convention I use is 11 onwards for rules that block access, 21 onwards for redirects and 31 onwards for rewrites. I also subdivide 11.a, 11.b, etc where merited.
Now, what if someone comes in with a request for an extension you don't use at all? At one time I had a global [NS] block on requests ending in .php just because a 403 Forbidden is so much more satisfying than a 404.
GET /something.php?this=that&something=theother HTTP/1.1 If it's a directory:
example.com/valid-directory//.html gives me a 403 Forbidden. That happened no matter what changes I made.
Adding [^\ ]* tests for "not a space" and is only relevant when it is added to a RewriteCond that is testing THE_REQUEST. It's the wrong thing to add in other places. It's looking for the space before HTTP in the literal request from the browser:
Rule 8 - the Condition purposely tests THE_REQUEST so that the pattern will be a match only when something was requested as a URL from somewhere out there the web, and not as the result of matching a prior internal rewrite. This prevents an infinite loop.
The Rule pattern (Rule not Condition) can be simplified from ^(([^/]+/)*[^.]+)\.html?[^\ ]*$ to ^(([^/]+/)*[^.]+)\.htm with no trailing $. This matches .htm and .html and .htm<anything>.
#8 Redirect remaining .htm or .html requests to extensionless URL, removing trailing invalid characters and parameters, forcing 'http://www.'
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+\.html?[^\ ]*\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*[^.]+)\.htm http://www.example.com/$1? [NC,R=301,L]
#11 Redirect requests with trailing invalid characters to extensionless URL, removing trailing parameters, forcing 'http://www.', excluding specific file types and folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*[\w\-/]*)?[^\w\-/\ ]+[^\ ]*\ HTTP/
RewriteCond %{REQUEST_URI} !\.(css|gif|jpe?g|png|js|ico|xml|txt)$ [NC]
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule ^((([^/]+/)*[\w\-/]*)?)[^\w\-/]+ http://www.example.com/$1? [R=301,L]
Rule 2 - This wouldn't rediredect /indexation if there were a literal escaped period after "index" in the pattern.
The pattern ^<rest of pattern>index\.htm with no trailing $ matches URL requests with anything after htm - an l, or any type of appended junk, likewise the pattern ^<rest of pattern>index\. will match "index dot anything".
I would simplify the Rule pattern from ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ to ^(([^/]+/)*)index\.htm with no trailing $. This matches .htm and .html and .htm<anything>.
#2 Redirect index requests in any directory to root of that directory, removing trailing invalid characters and parameters, forcing 'http://www.'
RewriteRule ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ http://www.example.com/$1? [NC,R=301,L]
http://www.example.com/index
http://www.example.com/index.
http://www.example.com/index/.
http://www.example.com/index./
http://www.example.com/index.htm
http://www.example.com/index,htm
http://www.example.com/index.abc
http://www.example.com/index/abc
http://www.example.com/index/.,/
http://www.example.com/index?;.?/
http://www.example.com/indexhtm
http://www.example.com/index-htm
http://www.example.com/index33
http://www.example.com/indexation
Rule 6a and 6b - Is there any request that can lead to an infinite http-https-http-https loop? The rules don't look "symmetrical" and "opposite".
This is normal in shared-hosting setups because they need to ensure that nobody gets into an .htaccess or .htpasswd file. Well, you'd do it on your own server too, only then you might not have htaccess files to protect.
This thread has been going on for quite a while, so I can no longer remember if you're testing in a WAMP-or-similar setup. If yes, take a closer look at the default config file. If you find a rule involving .ht try commenting it out and see if that affects your rule. If yes, it means that you can't go any further. If no, keep looking.
Or ignore the problem and proceed on the assumption that you will not get an awful lot of typo requests for /valid-directory//.html
But then why doesn't my index rule #2 require a condition to prevent the infinite loop?
#2 Redirect index requests in any directory to root of that directory, removing trailing invalid characters and parameters, forcing 'http://www.'
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index([^\w\-]+[^\ ]*)?\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ http://www.example.com/$1? [NC,R=301,L]
I removed it in the first place because it's identical to the pattern.
I removed it in the first place because it's identical to the pattern.
It's supposed to be. You're not testing the content, you're testing its source.
"The request is for index.php AND this request originated on the outside, rather than inside the present server."
But honestly it seems as if [NS] would do the job, since the specific purpose of this flag is to weed out server-internal requests.
#2 Redirect index requests in any directory to root of that directory, removing trailing invalid characters and parameters, forcing 'http://www.'
RewriteRule ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ http://www.example.com/$1? [NS,NC,R=301,L]
RewriteEngine On
RewriteBase /
#1 Redirect requests for old URLs to new URLs
RewriteRule ^old-page\.htm http://www.example.com/new-folder/new-page? [R=301,L]
# Then repeat the above 80 times.
#2 Redirect index requests in any directory to root of that directory, removing trailing
# invalid characters and parameters, forcing 'http://www.'
RewriteRule ^(([^/]+/)*)index([^\w\-]+[^\ ]*)?$ http://www.example.com/$1? [NS,NC,R=301,L]
#8 Redirect remaining .htm or .html requests to extensionless URL, removing trailing
# invalid characters and parameters, forcing 'http://www.'
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+\.html?[^\ ]*\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*[^.]+)\.htm http://www.example.com/$1? [NC,R=301,L]
#5 Redirect requests with trailing slash to extensionless URL, forcing 'http://www.',
# if request is a valid .htm file, excluding specific folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+/\ HTTP/
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^.]+)/$ http://www.example.com/$1 [R=301,L]
#11 Redirect requests with trailing invalid characters to extensionless URL, removing
# trailing parameters, forcing 'http://www.', excluding specific file types and folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*[\w\-/]*)?[^\w\-/\ ]+[^\ ]*\ HTTP/
RewriteCond %{REQUEST_URI} !\.(css|gif|jpe?g|png|js|ico|xml|txt)$ [NC]
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule ^((([^/]+/)*[\w\-/]*)?)[^\w\-/]+ http://www.example.com/$1? [R=301,L]
#9 Redirect requests with trailing query string to extensionless URL, removing trailing
# invalid characters, forcing 'http://www.', excluding specific folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#13 Redirect https: requests to 'http://www.' if request is a valid .htm file or directory,
# excluding specific folders and file
RewriteCond %{SERVER_PORT} ^443$
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond $1 !^file1
RewriteCond %{REQUEST_FILENAME}.htm -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^([^.]*)$ http://www.example.com/$1? [R=301,L]
#6a Redirect https: requests for non-www and non-webmail subdomains to 'https://www.'
# if request is a valid non-.htm file, excluding specific folders
RewriteCond %{HTTP_HOST} !^(www|webmail)\.example\.com$ [NC]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule (.*) [example.com...] [R=301,L]
#6b Redirect http: requests for non-www and non-webmail subdomains to 'http://www.'
# if request is a valid .htm file, non-.htm file, or directory
RewriteCond %{HTTP_HOST} !^(www|webmail)\.example\.com$ [NC]
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{REQUEST_FILENAME}.htm -f [OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#7 Internally rewrite extensionless URL requests to .htm file
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^.]+[^./]\ HTTP/
RewriteRule ^([^.]+[^./])$ /$1.htm [L]
It looks as if you have Fought The Good Fight. Go have a beer. If any residual issues come trickling in, you can deal with them later. In the last few days of testing, you have probably fed the server more bad requests than it sees in a year in real life happy!
Except for those pesky apple icon errors.
Huh? What apple icon errors? Have you mentioned them before?
File does not exist: /var/www/vhosts/example.com/httpdocs/apple-touch-icon-precomposed.png
File does not exist: /var/www/vhosts/example.com/httpdocs/apple-touch-icon.png