Forum Moderators: phranque
[webmasterworld.com...]
Well i just added a set of two rules to remove any uri that contains a query string (duplicate content?) and removed the www from the domain, and everything seems to be working fine but i was wondering if there is anything that is wrong with the code or that can be improved...as i have learned the way i do things usually works but isn't the best way!
ErrorDocument 404 /404.php
Options +FollowSymLinks
RewriteEngine On
# Externally Redirect containing ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*)/ http://example.com/${lc:$1}? [R=301,L]
#
# Externally Redirect containing ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*) http://example.com/${lc:$1}? [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with index and upper-case" to "lowercased uri without index"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/$ http://example.com/%1 [R=301,L]
#
# Externally redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://example.com/$1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]
# Externally Redirect containing ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*)/ http://example.com/${lc:$1}? [R=301,L]
#
# Externally Redirect containing ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*) http://example.com/${lc:$1}? [R=301,L]
--
It strips out anything after a ? in any uri. Is this the best way to do it? the rewriteconds check for ? and remove text after it and do 301 to the proper url, which also is lowercased and if there is slash it is removed
%{QUERY_STRING} . Where "dot" means "is one or more characters".
No need for the * in that condition. No need for the ( ) as you are not reusing that backreference data elsewhere in the rule.
Your line with THE_REQUEST might not be needed at all, or could be simplified by not using .* but testing for "everything that is not a question mark, up to a question mark" instead.
GET /foo.html? HTTP/1.1
So for this application only, I'd suggest keeping the RewriteCond testing %{THE_REQUEST}, but deleting the RewriteCond testing %{QUERY_STRING}, so that URLs with both blank and non-blank query data are detected.
And as g1smd observes, using ".*" is inefficient and should be avoided when possible. You can speed up processing by using:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[b][^?]*[/b]\?
Jim
At the moment my .htacess contains
--
ErrorDocument 404 /404.php
Options +FollowSymLinks
RewriteEngine On
#
# Redirect to remove multiple slash within URL-path
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule .* http://www.example.com${lc:%1}/${lc:%2} [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/${lc:%1} [R=301,L]
#
# Redirect anything with ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*)/ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*) http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://www.example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with index and upper-case" to "lowercased uri without index"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/$ http://www.example.com/%1 [R=301,L]
# Externally redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to view.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]
--
Im also wondering is the whole
" Redirect "with index and upper-case" to "lowercase uri without index" " really needed? since the section above that redirects anything with index to the lowercased version anyway? and it would still match any uppercase values as it uses the NC flag?
And if anyone is wondering the ${lc:$1} is a tolower function to return the lowercased text from this original thread:
[webmasterworld.com...]
the ${lc:$1} is a tolower function to return the lowercased text
if you or your host has defined "lc" as a RewriteMap in the server configuration file, and have mapped it to the server's operating system's "tolower" function.
This is by no means a "standard" or default feature among hosts.
Jim
I removed a few rules that i thought where not necessary since they where already being fixed in previous rules, and this is what i have now:
--
# Redirect anything with ? and slashes
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*)/+ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*) http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://www.example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/+$ http://www.example.com/%1 [R=301,L]
#
# Redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/+$ http://www.example.com/$1 [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/${lc:%1} [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]
--
If i was to enter http://www.example.com/folder/// (folder is a real physical folder) then nothing happens the extra trailing slash is not removed.
Strangely if i where to do http://www.example.com/FOLDER///
it is redirected to http://www.example.com/folder/ with the extra slashes removed.
Also if i enter http://www.example.com/FOLDER///? then it removed all the slashes... i need some way to check if its a directory with just one slash and if so remove the ? and do a proper redirect.
mod rewrite is killing me, help please.
is actually meant to be
# Redirect anything with ? and slashes
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)/+$ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)$ http://www.example.com/${lc:$1}? [R=301,L]
I cant seem to be able to edit my post.
#
# Redirect "with / and upper-case" to "lowercased uri without /"
# If the trailing-slash-stripped URL-path contains any uppercase letters
RewriteCond $1 [A-Z]
# get lowercase-converted trailing-slash-stripped URL-path to %1 from trailing-slash-stripped $1
RewriteCond ${lc:$1} ^(.+)$
# And if the requested lowercased, trailing-slash-stripped URL-path does not resolve to an existing directory
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
# Then redirect to lowercased, trailing-slash-stripped, canonical-domain URL
RewriteRule ^(.+)/+$ http://www.example.com/%1 [R=301,L]
Realizing that you're likely in a hurry to finish this project and deploy your solution, I still have to say that you *cannot* just copy and paste server configuration code -- And that is what mod_rewrite code is. If you do not understand every single jot and tittle, or you are not sure what effects the code might have on search engine behaviour, then please don't use the code.
Treat it as something dangerous like a chainsaw: Very useful, but potentially deadly if not used properly, and with attention to detail at all times. If you hit a knot in the wood, the saw can jump up like a snake and cut your throat. Or the whole darn tree may fall on you if you focus on the details but don't keep an eye on the overall situation.
I hope you tested the lowercasing on your production server. If you didn't, your should do so now, because it is not likely to work --as stated in my previous post-- unless you have added a RewriteMap to your server configuration (e.g. httpd.conf or conf.d) files, or have gotten your host to do so.
One further comment: Your new "^([^?]*)$" pattern is very strange -- and not really suitable for use in RewriteRule, because a question mark cannot occur in the URL-path tested by RewriteRule. While it is good to avoid ".*" patterns when more-specific patterns can be used, as previously-discussed, there are just some cases where what is actually needed is the ".*" pattern, and if so, then that is what should be used.
In fact, the most-frequently occurring case for legitimate use of ".*" is in just such cases: Where ".*" stands alone or followed by one or a very few additional characters or a character-group in the pattern.
So, I suggest you delete or comment-out the "directory-exists-check" RewriteCond, and use "^(.*)$" instead of "^([^?]*)$" in your RewriteRules.
Jim
However as you state im not ready to use mod_rewrite code at this level with so many rules just yet.. its just too much for me at the moment... so im just going to use some simple rules to return 404's if the url is not exactly what is needed instead of redirecting everything to the correct url. At the moment ive changed everything to this:
--
# Redirect uri with with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)$ http://www.example.com/$1? [R=301,L]
#
# Redirect "with index to uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/$1 [R=301,L,NC]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/%1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Rewrite artist/albumname(album) to album.php
RewriteRule ^([a-z0-9-]+)/([a-z0-9-]+)\(album\)$ album.php?a=$1&p=$2 [L]
--
I also just wanted to ask, my host has added the rewritemap to the httpd.conf for the site is it okay to leave it there since im not using it? or should i tell him to remove it?