Forum Moderators: phranque

Message Too Old, No Replies

Working code: www to non-www and uppercase to lowercase - pls review

www uppercase lowercase rewritemap rewritecond rewriterule

         

DyeA

3:57 am on May 9, 2008 (gmt 0)

10+ Year Member



Wow this is a relief, thanks to the forum I have working code to go www to nonwww and uppercase to lowercase. Could someone with a practised eye verify there are no hidden gotchas here that I am not seeing because I am a newb? The code seems to work on www.example.com, example.com/Apple/apple.htm, example.com/Apple.htm, www.example.com/Apple.htm, and example.com/Apple/Apple.htm

Here's the code I'm using
## Rewrite for www to nonwww urls
RewriteEngine on
#
RewriteCond %{HTTP_HOST} ^www\.(([^.]+\.)+([^.]+))
RewriteRule ^/(.*)$ [%1...] [R=301]

## Rewrite to force lowercase
RewriteMap lowercase int:tolower
RewriteCond $1 [A-Z]
RewriteRule ^(.*)$ ${lowercase:$1} [R=301]

I have placed this in the httpd.conf of my main server but it is not in a container or directory or any "tag"

My virtual hosts have this code in their vhost.conf files
RewriteEngine on
RewriteOptions inherit

Is this acceptable?
If I were to put a [L] on one of my rules in httpd.conf would that prevent a rule from running in my vhost.conf(if i were to put rules in there at a later date)

jdMorgan

4:16 am on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should reverse the rules, putting the most-specific first, and I'd recommend adding the hostname to the lowercasing rule, since this is an external redirect as well. You want to do the redirect "all at once" whenever possible, and avoid "chained" or "stacked" redirects.

RewriteEngine on
#
# Rewrite to force lowercase
RewriteMap lowercase int:tolower
RewriteCond $1 [A-Z]
RewriteCond %{HTTP_HOST} ^(www\.)?(([^.]+\.)+([^.]+))
RewriteRule ^/(.*)$ http://%2/${lowercase:$1} [R=301,L]
#
# Rewrite for www to non-www urls
RewriteCond %{HTTP_HOST} ^www\.(([^.]+\.)+([^.]+))
RewriteRule ^/(.*)$ http://%1/$1 [R=301,L]

The [L] flag stops rewriterule processing only in the current context, and will not affect other rules located in vhost.conf, conf.d or .htaccess files.

Jim

DyeA

4:41 am on May 9, 2008 (gmt 0)

10+ Year Member



Thanks for the reply Jim, I don't understand though, it appears to me that if someone comes in with http://www.example.com/Apple.htm it is going to match in the lowercasing ruleset and the [L] is going to stop it from being checked for a www? Am I not understanding how the [L] works?

Secondly, could you explain your use of the term "external" redirect in reference to the lowercasing rule? If the user comes in with example.com/Apple.htm and we redirect them to example.com/apple.htm how is that external? I guess I'm just not sure what "external" means in this context.

Thanks much,
Ward

g1smd

9:20 am on May 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



By mentioning the www domain name in the target of the to-lower-case redirect, you will fix both the upper-case problem and the non-www (if present) problem at the same time. That redirect therefore works for both www and non-www and changes the URL to lower-case, as well as forcing www at the same time. That rule should therefore come first.

For any URL that is already lower-case, the first rule will be skipped, and the non-www to www redirect will then kick in afterwards, only if it is required.

So, for a request that is:
non-www and upper-case -- fixed by first rule to www and lower-case.
www and upper-case -- fixed by first rule to www and lower-case.
non-www and lower-case -- fixed by second rule to www (and is already lower-case).
www and lower-case -- no action taken.

With the rules the other way round, or the first rule not fixing the www, a request for non-www and upper-case, would see one rule fixing the non-www to www and the other rule fixing the upper-case to lower-case. That means the browser makes three requests in all. That is a redirection chain, and is a very bad move.

DyeA

6:49 pm on May 9, 2008 (gmt 0)

10+ Year Member



Ahhhh thanks, it didn't even register with me that Jim had put the www replace regex in there as well. Makes sense.

Above this code I have placed my rewrites for a series of 25 specific pages that need to be directed to a specific new page or different tld. They are written like this.

RewriteCond $1 ^/example\.htm$ [NC]
RewriteRule ^(.*)$ http://example.com/ [R=301,L]

See any issues with the rule or the position in execution order?

Thanks,
Ward

jdMorgan

5:46 pm on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As lon as you're going from most-specific to least-specific, then no problem.

However, your rule construct is rather inefficient, since the same function can be (and should be) accomplished without a RewriteCond:


RewriteRule ^/example\.htm$ http://example.com/ [R=301,L]

The only time you must use a RewriteCond to check a single requested local URL-path is when you need to do two things in the same rule:
  • Use a negative-match pattern on that URL-path
    -AND-
  • Also create a back-reference on that URL path

    Due to the way mod_rewrite works, it is not possible to create a back-reference on a negative match, which is why you'd need a RewriteCond in this specific case.

    For example, you could re-code the first rule I posted above as:


    # Rewrite to force lowercase
    RewriteMap lowercase int:tolower
    RewriteCond %{HTTP_HOST} ^(www\.)?(([^.]+\.)+([^.]+))
    RewriteRule ^/([^A-Z]*[A-Z].*)$ http://%2/${lowercase:$1} [R=301,L]

    However in this case, the way I posted it above is clearer, and since it's a one-off rule, having the extra RewriteCond line makes little difference. But since you're adding 25 specific rules, reducing them each to one line is a significant improvement.

    Jim

  • DyeA

    11:17 pm on May 10, 2008 (gmt 0)

    10+ Year Member



    Good point! I rewrote those and removed the unnecessary RewriteCond. I have to say this server stuff is kind of addicting - I am enjoying it. I probably tend to ask to many unnecessary questions but understanding it interests me.

    In regards to your post I would have to say that by rewriting these rules we have eliminate 25 lines of code which is good, but in terms of cpu intensiveness we probably haven't saved anything unless there is a hit. I.E. on a non-match we either compare against the Cond and then stop or we compare against the Rule and stop there would be no extra processing. On a match, we save a comparison because we don't have to run a regex in the Cond and then another in the Rule and create and store an unnecessary back reference. Am I right in thinking this?

    My next steps will be to create a symbolic link on the server to a directory called "common" and then store all css and js files there as all the sites I am running make use of the same file. Just having to modify and update one file should save me some steps. Then I would like to experiment with manually gzipping all css, js, and html files, have the server perform some content negotiation and serve zipped content if it exists and the UA says it takes it, and see if I can't speed up site rendering time this way.

    I'm following the Yslow school of server optimization so probably moving my js to a different, central tld on the server to allow for more consecutive requests, moving js to the bottom of the html, and perhaps hosting my own analytics code as well.

    Thanks for the help!
    Ward

    jdMorgan

    4:57 am on May 11, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    ...compare against the Cond and then stop...

    Read the mod_rewrite documentation: RewriteConds are not processed at all unless the RewriteRule pattern matches. Replacing the catch-all pattern in your RewriteRule with that from the RewriteCond, and thus eliminating the need for the RewriteCond saves quite a bit of wasted CPU time processing both RewriteRule and RewriteCond directives. :)

    Avoid the temptation to view httpd.conf or .htaccess code as a "program" -- It is not. The directives are not executed sequentially; They are processed in turn by each Apache module, which handles only the directives it understands.

    Generally, directives belonging to the same module will be executed sequentially, but again we have a partial exception here, in that the RewriteRule pattern is processed first. If there is match, then the RewriteCond is processed, and if it evaluates as True, then the RewriteRule rewriting phase is invoked.

    Module execution order is determined by the reverse LoadModule order on Apache 1.x, and by an internal priority scheme on Apache 2.x

    Jim

    g1smd

    4:40 pm on May 11, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    *** RewriteConds are not processed at all unless the RewriteRule pattern matches. Replacing the catch-all pattern in your RewriteRule with that from the RewriteCond, and thus eliminating the need for the RewriteCond saves quite a bit of wasted CPU time processing both RewriteRule and RewriteCond directives. :) ***

    *** Generally, directives belonging to the same module will be executed sequentially, but again we have a partial exception here, in that the RewriteRule pattern is processed first. If there is match, then the RewriteCond is processed, and if it evaluates as True, then the RewriteRule rewriting phase is invoked. ***

    How the heck did I miss this vital piece of information over the last few years?

    That explains several questions that I was wanting to ask but could not formulate into words.

    You know how it is when something is bugging you, but you aren't sure what it is and why...

    I had always assumed that the Rule wasn't processed unless all the Conditions were matched.

    That is partially true, but depends on what is meant by "the Rule isn't processed". I now see that means that the right-hand side isn't formulated, but had missed the fact that the left-side is read first.

    jdMorgan

    8:39 pm on May 11, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Yeah, that's why RewriteConds and RewriteRules can back-reference each other's pattern-matches... :)

    See the "Processing" section in the mod_rewrite doc -- There's even a nice picture of how rules are processed.

    The module-execution-order thing is very hard to explain. I still haven't found a concise-but-precise way of putting it into words.

    Jim