Forum Moderators: phranque

Message Too Old, No Replies

Just wanted to make sure code is correct.

         

maxed

9:27 am on Nov 17, 2008 (gmt 0)

10+ Year Member



Okay thanks to jdMorgan and g1smd's help i was able to make a pretty thorough set of rewriterules to take care of alot duplicate content and rewrite problems.

[webmasterworld.com...]

Well i just added a set of two rules to remove any uri that contains a query string (duplicate content?) and removed the www from the domain, and everything seems to be working fine but i was wondering if there is anything that is wrong with the code or that can be improved...as i have learned the way i do things usually works but isn't the best way!

ErrorDocument 404 /404.php
Options +FollowSymLinks
RewriteEngine On
# Externally Redirect containing ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*)/ http://example.com/${lc:$1}? [R=301,L]
#
# Externally Redirect containing ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*) http://example.com/${lc:$1}? [R=301,L]
#
# Externally redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with index and upper-case" to "lowercased uri without index"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/$ http://example.com/%1 [R=301,L]
#
# Externally redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://example.com/$1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]

maxed

10:00 am on Nov 18, 2008 (gmt 0)

10+ Year Member



basically i just wanted to know if using this okay:
--

# Externally Redirect containing ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*)/ http://example.com/${lc:$1}? [R=301,L]
#
# Externally Redirect containing ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\? [NC]
RewriteCond %{QUERY_STRING} (.*)
RewriteRule (.*) http://example.com/${lc:$1}? [R=301,L]
--

It strips out anything after a ? in any uri. Is this the best way to do it? the rewriteconds check for ? and remove text after it and do 301 to the proper url, which also is lowercased and if there is slash it is removed

g1smd

11:47 am on Nov 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you need to just test a query string, you could also look to

%{QUERY_STRING} .

Where "dot" means "is one or more characters".

No need for the * in that condition. No need for the ( ) as you are not reusing that backreference data elsewhere in the rule.


Your line with THE_REQUEST might not be needed at all, or could be simplified by not using .* but testing for "everything that is not a question mark, up to a question mark" instead.

jdMorgan

2:42 pm on Nov 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Testing %{THE_REQUEST} is the only way to detect a request with a "?" but no query data, such as
GET /foo.html? HTTP/1.1

So for this application only, I'd suggest keeping the RewriteCond testing %{THE_REQUEST}, but deleting the RewriteCond testing %{QUERY_STRING}, so that URLs with both blank and non-blank query data are detected.

And as g1smd observes, using ".*" is inefficient and should be avoided when possible. You can speed up processing by using:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[b][^?]*[/b]\?

I also removed [NC] since no valid request will contain lowercase letters in the HTTP method field.

Jim

g1smd

4:59 pm on Nov 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, I missed the fact that "question mark but no query data" was being looked for.

Good catch.

maxed

10:11 pm on Nov 18, 2008 (gmt 0)

10+ Year Member



THanks for the replys, i made the changes and everything is working fine.... :)

I just decided to remove anything after any ? because some people link with random ?434sdf=fsf variables in some url's which just seem to cause problems...

maxed

11:09 pm on Nov 18, 2008 (gmt 0)

10+ Year Member



Okay so i decided that it may also be important to account for any double or triple slashes that occur in the URI... I have got it working to remove any instance of a double slash to a single one, but if there is more than two it does multiple redirects to get rid of each 'bad' slash.. and if someone where to do something like
http://www.example.com//index.html then first the bad slash is removed and then another redirect removes the index... i just cant seem to figure out how to make it do everything with 1 redirect....

At the moment my .htacess contains
--
ErrorDocument 404 /404.php
Options +FollowSymLinks
RewriteEngine On
#
# Redirect to remove multiple slash within URL-path
RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule .* http://www.example.com${lc:%1}/${lc:%2} [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/${lc:%1} [R=301,L]
#
# Redirect anything with ? and slash
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*)/ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*) http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://www.example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with index and upper-case" to "lowercased uri without index"
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/$ http://www.example.com/%1 [R=301,L]

# Externally redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to view.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]
--

Im also wondering is the whole
" Redirect "with index and upper-case" to "lowercase uri without index" " really needed? since the section above that redirects anything with index to the lowercased version anyway? and it would still match any uppercase values as it uses the NC flag?

And if anyone is wondering the ${lc:$1} is a tolower function to return the lowercased text from this original thread:
[webmasterworld.com...]

jdMorgan

1:32 am on Nov 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I want to add to

the ${lc:$1} is a tolower function to return the lowercased text

if you or your host has defined "lc" as a RewriteMap in the server configuration file, and have mapped it to the server's operating system's "tolower" function.

This is by no means a "standard" or default feature among hosts.

Jim

maxed

1:50 am on Nov 19, 2008 (gmt 0)

10+ Year Member



Okay ive edited my rewrite rules and ive nearly got everything working except for some strange reason certain variations in the uri..... this is making me go crazy!

I removed a few rules that i thought where not necessary since they where already being fixed in previous rules, and this is what i have now:

--
# Redirect anything with ? and slashes
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*)/+ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*) http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect *only* direct client requests for the script back to friendly URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /view\.php\?d=([a-zA-Z0-9-]+)
RewriteRule ^view\.php$ http://www.example.com/${lc:%1}? [R=301,L]
#
# Redirect "with index to "lowercased uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/${lc:$1} [R=301,L,NC]
#
# Redirect "with / and upper-case" to "lowercased uri without /"
RewriteCond $1 [A-Z]
RewriteCond ${lc:$1} ^(.+)$
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
RewriteRule ^(.+)/+$ http://www.example.com/%1 [R=301,L]
#
# Redirect to convert uppercase to lowercase
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.+) http://www.example.com/${lc:$1} [R=301,L]
#
# Externally redirect to get rid of trailing slashes except for home page
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.+)/+$ http://www.example.com/$1 [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/${lc:%1} [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite "friendly" URL requests to index.php
RewriteRule ^([a-z0-9-]+)$ view.php?d=$1 [L]
--

If i was to enter http://www.example.com/folder/// (folder is a real physical folder) then nothing happens the extra trailing slash is not removed.

Strangely if i where to do http://www.example.com/FOLDER///
it is redirected to http://www.example.com/folder/ with the extra slashes removed.

Also if i enter http://www.example.com/FOLDER///? then it removed all the slashes... i need some way to check if its a directory with just one slash and if so remove the ? and do a proper redirect.

mod rewrite is killing me, help please.

maxed

3:27 am on Nov 19, 2008 (gmt 0)

10+ Year Member



the part:
# Redirect anything with ? and slashes
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*)/+ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule (.*) http://www.example.com/${lc:$1}? [R=301,L]
#

is actually meant to be
# Redirect anything with ? and slashes
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)/+$ http://www.example.com/${lc:$1}? [R=301,L]
#
# Redirect anything with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)$ http://www.example.com/${lc:$1}? [R=301,L]

I cant seem to be able to edit my post.
#

jdMorgan

4:27 am on Nov 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is in this code, which explicitly requires that the directory NOT exist. If it does exist, no redirect will occur.
 
# Redirect "with / and upper-case" to "lowercased uri without /"
# If the trailing-slash-stripped URL-path contains any uppercase letters
RewriteCond $1 [A-Z]
# get lowercase-converted trailing-slash-stripped URL-path to %1 from trailing-slash-stripped $1
RewriteCond ${lc:$1} ^(.+)$
# And if the requested lowercased, trailing-slash-stripped URL-path does not resolve to an existing directory
RewriteCond %{DOCUMENT_ROOT}/%1/ !-d
# Then redirect to lowercased, trailing-slash-stripped, canonical-domain URL
RewriteRule ^(.+)/+$ http://www.example.com/%1 [R=301,L]

So, you have to ask yourself, "Is this the exact description of the behaviour I want?" It had better be, because that is the behaviour you are going to get.

Realizing that you're likely in a hurry to finish this project and deploy your solution, I still have to say that you *cannot* just copy and paste server configuration code -- And that is what mod_rewrite code is. If you do not understand every single jot and tittle, or you are not sure what effects the code might have on search engine behaviour, then please don't use the code.

Treat it as something dangerous like a chainsaw: Very useful, but potentially deadly if not used properly, and with attention to detail at all times. If you hit a knot in the wood, the saw can jump up like a snake and cut your throat. Or the whole darn tree may fall on you if you focus on the details but don't keep an eye on the overall situation.

I hope you tested the lowercasing on your production server. If you didn't, your should do so now, because it is not likely to work --as stated in my previous post-- unless you have added a RewriteMap to your server configuration (e.g. httpd.conf or conf.d) files, or have gotten your host to do so.

One further comment: Your new "^([^?]*)$" pattern is very strange -- and not really suitable for use in RewriteRule, because a question mark cannot occur in the URL-path tested by RewriteRule. While it is good to avoid ".*" patterns when more-specific patterns can be used, as previously-discussed, there are just some cases where what is actually needed is the ".*" pattern, and if so, then that is what should be used.

In fact, the most-frequently occurring case for legitimate use of ".*" is in just such cases: Where ".*" stands alone or followed by one or a very few additional characters or a character-group in the pattern.

So, I suggest you delete or comment-out the "directory-exists-check" RewriteCond, and use "^(.*)$" instead of "^([^?]*)$" in your RewriteRules.

Jim

maxed

5:08 am on Nov 19, 2008 (gmt 0)

10+ Year Member



I have to agree with you, at the moment all this rewriting is too much for me... My host has been very nice to allow me to use the Rewritemap and yes it works just fine.

However as you state im not ready to use mod_rewrite code at this level with so many rules just yet.. its just too much for me at the moment... so im just going to use some simple rules to return 404's if the url is not exactly what is needed instead of redirecting everything to the correct url. At the moment ive changed everything to this:

--
# Redirect uri with with ?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteRule ^([^?]*)$ http://www.example.com/$1? [R=301,L]
#
# Redirect "with index to uri without index"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index(\.php¦\.html) [NC]
RewriteRule ^(.*)index(\.php¦\.html)$ http://www.example.com/$1 [R=301,L,NC]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/%1 [R=301,L]
#
# Externally redirect to force canonical hostname
RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^www\.example\.com(\.¦\.?:[0-9]+)$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Rewrite artist/albumname(album) to album.php
RewriteRule ^([a-z0-9-]+)/([a-z0-9-]+)\(album\)$ album.php?a=$1&p=$2 [L]
--

I also just wanted to ask, my host has added the rewritemap to the httpd.conf for the site is it okay to leave it there since im not using it? or should i tell him to remove it?

jdMorgan

2:39 pm on Nov 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Leave it in place. It will have almost zero impact on server performance, since it is processed only once when the server is restarted, and you may need it later... :)

Jim

maxed

9:14 pm on Nov 19, 2008 (gmt 0)

10+ Year Member



thanks for all the help!