Forum Moderators: phranque
RewriteEngine On
RewriteBase /
#1 - Redirect requests for old URLs to new URLs
RewriteRule ^old-page\.html?$ http://www.example.com/new-folder/new-page [R=301,L]
# Then repeat the above 80 times.
#2 - Redirect index.html or .htm in any directory to root of that directory and force www
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1? [R=301,L]
#3 - Redirect all .html requests to .htm on canonical host.
RewriteRule ^([^.]+)\.html$ http://www.example.com/$1.htm [R=301,L]
#4 - Redirect direct client request for old URL with .htm extension
# to new extensionless URL if the .htm file exists
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.htm\ HTTP/
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(([^/]+/)*[^.]+)\.htm$ http://www.example.com/$1 [R=301,L]
#5 - Redirect any request for a URL with a trailing slash to extensionless URL
# without a trailing slash unless it is a request for an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)/$ http://www.example.com/$1 [R=301,L]
#6 - Redirect requests for non-www/ftp/mail subdomain to www subdomain.
RewriteCond %{HTTP_HOST} !^(www|ftp|mail)\.example\.com$
RewriteRule ^([^.]+)$ http://www.example.com/$1 [R=301,L]
#7 - Internally rewrite extensionless URL request
# to .htm file if the .htm file exists
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
# Then repeat the above 80 times.
#3 - Redirect all .html requests to .htm on canonical host.
#4 - Redirect direct client request for old URL with .htm extension
#5 - Redirect any request for a URL with a trailing slash to extensionless URL without a trailing slash unless it is a request for an existing directory
#6 - Redirect requests for non-www/ftp/mail subdomain to www subdomain.
RewriteCond %{HTTP_HOST} !^(www|ftp|mail)\.example\.com$
Besides, didn't rule group #1 already take care of any requests for the old URLs?
Is there no pattern at all that will allow you to collapse the redirects into a smaller number of rules?
You may find it cleaner to rewrite (not redirect) to a php script that does the lookup and issues the redirect.
^(.+\.htm)l to $1 wink
That's assuming your paths contain literal periods so you can't use [^.]. Your other rules suggest they don't, so you can use the same formulation here.
Why do you need this rule? Have you been getting requests with spurious directory slash? There's quite a long list of "rules you don't need unless you need them", and this would seem to qualify.
RewriteEngine On
RewriteBase /
#1 - Redirect requests for old URLs to new URLs
RewriteRule ^old-page\.htm$ http://www.example.com/new-folder/new-page [R=301,L]
# Then repeat the above 80 times.
#2 - Redirect index.html or .htm in any directory to root of that directory and force www
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1? [R=301,L]
#6 - Redirect requests for non-www/ftp/mail subdomain to www subdomain.
RewriteCond %{HTTP_HOST} !^(www|ftp|mail)\.example\.com$
RewriteCond %{HTTP_HOST} example\.com
RewriteRule ^([^.]+)$ http://www.example.com/$1 [R=301,L]
#7 - Internally rewrite extensionless URL request
# to .htm file if the .htm file exists
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
Depending on what happens in your server, it may be necessary-- horrid possibility!-- to preface every single one of your 80 redirects with a RewriteCond looking at THE_REQUEST. But let's not borrow trouble!
Rule #7 will probably run faster if you express the pattern as
^([^.]+[^./])$
Here too it may be necessary to include a condition looking at THE_REQUEST. More necessary, possibly, since it's an internal rewrite. If so, it should come before the existing condition.
A slightly modified Rule 4 could achieve the same result.
As coded, the first RewriteCond stops the redirect when any of the www. or ftp. or mail. sub-domains are requested by HTTP.
The code in Rule 6 doesn't match the plain-English description in the comment. As coded, the first RewriteCond stops the redirect when any of the www. or ftp. or mail. sub-domains are requested by HTTP. Is the comment what you want to do and the code wrong, or is the code right and the comment wrong?
#6 - Redirect requests for non-www/ftp/mail subdomain to www subdomain.
RewriteCond %{HTTP_HOST} !^(www|ftp|mail)\.example\.com$
RewriteCond %{HTTP_HOST} example\.com
RewriteRule ^([^.]+)$ http://www.example.com/$1 [R=301,L]
Is that to save resources? To keep #7 from doing a file check on every single request that hits it?
#7
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
Rule #7 will probably run faster if you express the pattern as
^([^.]+[^./])$
It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.
- Knuth
To me the rule says: If the request is for a named subdomain other that www or ftp or mail, then redirect to www. Come to think of it, was this rule intended for domain-name canonicalization? If so, use the ordinary
!^(www\.example\.com)?$
pattern. (One condition only.) I don't think you need to say anything about mail or ftp at all, unless those really are named subdomains accessed via http.
If more than one condition has to be met, list them starting with most likely to fail. Not much use making the server run through a long list of things that apply to 1/10 of all requests if the last thing on the list only applies 1/1000 of the time.
If any one of a group of conditions has to be met, list them starting with most likely to succeed.
In each case the object is simply to let the rewrite engine finish its stuff and get out of there sooner.
In addition to none of these patterns being a clear performance winner, it's also worth mentioning that the difference between them was always measured not even in microseconds, but nanoseconds. That difference is so infinitesimally small that for all practical purposes, there is no performance difference.
Run Xenu LinkSleuth over the site and check for errors.
Also construct a text file list of "good" and "bad" URLs. Duplicate the whole lot for both non-www and www versions. In LinkSleuth set the scan depth to the lowest possible then import that list of URLs and check you get the right results.
It always reports: error code: 503 (temporarily overloaded). I wonder if Google sees that as a broken link? It's strange. The link works. But it always reports as 503.
I was very surprised at how many error combinations resulted in 200-responses that shouldn't have.
It always reports: error code: 503 (temporarily overloaded). I wonder if Google sees that as a broken link? It's strange. The link works. But it always reports as 503.
Is the exact wording "temporarily overloaded" coming from Xenu or from the server? It's more precise than the definition of 503 ("service unavailable").
As a human, can you get to the page "cold" by simply typing in the URL, or do you have to follow some kind of procedure?
The reason you feed both good and bad URLs to Xenu in the text file list is to test that the site returns the correct response for both correct and incorrect requests, for wanted and unwanted requests. You add a selection of page names that don't exist, incorrect extensions, unwanted or unnecessary parameters, and so on. Some of my test files have thousands of URLs and can quickly verify that I haven't introduced problems when altering the site configuration.
RewriteEngine On
RewriteBase /
#1 Redirect requests for old URL to new URL
RewriteRule ^old-page\.htm$ http://www.example.com/new-folder/new-page [R=301,L]
# Then repeat the above 80 times.
#2 Redirect index requests in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [NC,R=301,L]
#8 Redirect remaining .htm or .html requests to extensionless URL
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.html?\ HTTP/ [NC]
RewriteRule ^(([^/]+/)*[^.]+)\.html?$ http://www.example.com/$1 [NC,R=301,L]
#9 Redirect URLs containing valid characters to remove query string except for specific folders
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
#10 Redirect URLs containing valid characters to remove trailing invalid characters
RewriteRule ^([/0-9a-z._\-]*)[^/0-9a-z._\-] http://www.example.com/$1 [NC,R=301,L]
#11 Redirect URLs containing valid characters to remove trailing punctuation
RewriteRule ^(.*)[^/0-9a-z]+$ http://www.example.com/$1 [NC,R=301,L]
#5 Redirect requests with trailing slash to extensionless URL unless a directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+/\ HTTP/
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(([^/]+/)*[^.]+)/ http://www.example.com/$1 [R=301,L]
#6 Redirect requests for non-www and non-webmail subdomains to www subdomain
RewriteCond %{HTTP_HOST} !^(www|webmail)\.example\.com$ [NC]
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#13 Redirect https requests to http except for specific file types, folders, and file
RewriteCond %{SERVER_PORT} ^443$
RewriteCond $1 !\.(css|gif|jpe?g|bmp|png|js|ico|xml|txt)$ [NC]
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteCond $1 !^file1
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#7 Internally rewrite extensionless URL requests to .htm file if .htm file exists
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^.]+[^./]\ HTTP/
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^([^.]+[^./])$ /$1.htm [L]
#8 Redirect remaining .htm or .html request to extensionless URL if file exists as an .htm version
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.html?\ HTTP/ [NC]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(([^/]+/)*[^.]+)\.html?$ http://www.example.com/$1 [NC,R=301,L]
Do rules 10 and 11 really work as you expect? At first glance, the patterns look ambiguous and prone to mismatching.
I think rule 13 is in the wrong place/wrong order.
http://www.example.com/.
http://www.example.com?
http://www.example.com/?
#11 Redirect URLs containing valid characters to remove trailing punctuation
RewriteRule ^(.*)[^/0-9a-z]+$ http://www.example.com/$1 [NC,R=301,L] #8 Redirect remaining .htm or .html request to extensionless URL if file exists as an .htm version
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/\ ]+/)*[^.\ ]+\.html?\ HTTP/ [NC]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(([^/]+/)*[^.]+)\.html?$ http://www.example.com/$1 [NC,R=301,L]
That will only work for .htm requests though, not html requests. I'm wondering if I can modify the file exists check to work for .html requests also? As far as I can tell though, it looks like I'll need to add a rule above #8 to convert .html requests to .htm first.
I think I would be happier if....
#11 Redirect URLs containing valid characters to remove trailing punctuation
RewriteRule ^(.*)[^/0-9a-z]+$ http://www.example.com/$1 [NC,R=301,L] Seems like you should be able to say
RewriteCond %{REQUEST_FILENAME}l? -f
Here, again, you're getting into Things You Don't Need Until You Need Them territory. Just how many requests do you get that end in .html? but that refer to files which never existed in the first place?
Any literal . immediately after the / seem to disappear. Question marks remain, but are ignored, as are .? combinations.
^([\w/-]+(\.\w+)?)?.+
I think about when I email a link to somebody, I usually paste the link, then immediately follow it with a period, comma, etc. I don't know how easy it is for a software to mess that up and lump the punctuation in with the link.
I thought of that but didn't think it could possibly work
RewriteCond %{REQUEST_FILENAME}l? -f
# Redirect URL containing valid characters to remove trailing characters
RewriteRule ^([\w/-]+(\.\w+)?)?.+ http://www.example.com/$1 [R=301,L]
it's causing the last character of a valid url to get truncated. So example.com/page is 301 redirecting to example.com/pag
If it's example.com/page.html, then adding l? would be adding a second l that's an optional l. So you'd be left with example.com/page.html or example.com/page.htmll. Neither is a valid file.
RewriteRule ^(([^./]+/)*[^./]+\.(html|php))/ http://www.example.com/$1 [R=301,L] # Redirect URL containing valid characters to remove trailing invalid characters
RewriteRule ^([\w/-]+(\.\w+)?)?[^a-zA-Z\d].* http://www.example.com/$1 [R=301,L]
# Redirect URL containing valid characters to remove trailing invalid characters except for specific file types and folders
RewriteCond $1 !\.(css|gif|jpe?g|bmp|png|js|ico|xml|txt)$ [NC]
RewriteCond $1 !^(shopping-cart-folder|site-stats-folder)/
RewriteRule ^([/0-9a-z_\-]*)[^/0-9a-z_\-]+$ http://www.example.com/$1 [NC,R=301,L]