Forum Moderators: phranque

Message Too Old, No Replies

Generating less rewrites

I've been reading the rewrite logfile...

         

RedAndy

2:05 am on May 10, 2006 (gmt 0)

10+ Year Member



Hi,
I was reading the rewrite logfile on my test server ( apache 2 ) today and noticed that a fairly light page is generating ~200 lines of log. So, I wanted to ask - is this normal?

I modified my code and got it down to ~140 ( after blowing it out to >250 :) with the following


RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.example\.com
RewriteCond %{HTTP_HOST}!^example$
RewriteRule (.*)[^(jpg¦gif¦png¦css)]$ http://www.example.com/$1 [R=301,L]

I'm hoping that I haven't reduced the # of lines only to add a much slower pattern?

Is there any way I can reduce the number of rewrite checks further or should I just not worry at this point? The site responds fine but it's getting busier all the time and is on shared hosting so I want to stay on top of it,

Many thanks for your help,

Andy

( the name of my local test server is "example" I suppose I could remove that line on the live server to save another check but I really like to keep things _exactly_ the same )

jdMorgan

1:39 pm on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you review the operation of character-group patterns, I think you'll find your rule doesn't do what you want it to do. "[^(jpg¦gif¦png¦css)]$" would mean, "ending in any character except for j, p, g, ¦, i, f, n, c, or s."

Also, if your intent is to exclude both www.example.com and example.com, you can eliminate the second redundant RewriteCond by making the "www." optional in the first one.

So, a better wat to write it might be:


RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^(www\.)?example\.com
RewriteRule !\.(jpg¦gif¦png¦css)$ http://www.example.com%{REQUEST_URI} [R=301,L]

Note that we must use %{REQUEST_URI}, because you cannot back-reference the contents of a negative pattern. In other words, as documented in the Apache manual, the following WILL NOT WORK:

RewriteRule!\.(jpg¦gif¦png¦css)$ http://www.example.com$1 [R=301,L])

Also note that if you are hosted on a name-based virtual server, you can remove the first RewriteCond that checks for a blank hostname. Name-based servers cannot be accessed at all unless the hostname is not blank, and only old true HTTP/1.0 clients won't send the hostname header. For name-based servers, there's no use checking for a blank hostname.

Replace all broken pipe "¦" characters above with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

RedAndy

5:00 am on May 13, 2006 (gmt 0)

10+ Year Member



Thanks Jim,

I have it down to to 97 log entries now - most of which are pass throughs, a vast imporovement from the 200+ I started this with. Here's my 'final' set of rules:


Options -Indexes
#Enable
RewriteEngine On
#Ensure WWW or dev server
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} !^example(:[0-9]+)?$
RewriteRule !\.(jpg¦gif¦png¦css¦js)$ http://www.example.com/%{REQUEST_URI} [R=301,L]

#send into CMS & allow blank - goes to the home page
RewriteRule ^([a-z0-9_/]{3,255}¦)$ index.php?url=$1 [L]

I left in "RewriteCond %{HTTP_HOST} ." as in a previous post you advised that if my site is the default host on a shared name based server it will fail with HTTP/1.0 clients without it - can I safely remove it?

I haven't had any problems with the 2nd ( and last ) RewriteRule but is (something¦) an acceptable pattern - ie does "or nothing" work or am I set myself up for a problem later?

thanks,

Andy

[edited by: jdMorgan at 5:05 am (utc) on May 13, 2006]
[edit reason] Disabled smilies in code [/edit]

jdMorgan

5:14 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> can I safely remove "RewriteCond %{HTTP_HOST} ."?
Loaded question, that one... I'll leave that for you to test and decide.

> but is (something¦) an acceptable pattern ...?

Actually, I've never tried that. Test it, find out, and let us know!

To accept either the pattern-specified string or <blank>, I would habitually use:


RewriteRule ^([a-z0-9_/]{3,255}[b])?$[/b] /index.php?url=$1 [L]

where the "?" makes the entire preceding parenthesized expression optional.

Jim

RedAndy

8:55 am on May 13, 2006 (gmt 0)

10+ Year Member



>> but is (something¦) an acceptable pattern ...?

>Actually, I've never tried that. Test it, find out, and let us know!

Well, I was using it before I posted and it's working as expected on apache2-2.0.49. It doesn't re-write 'unnaceptable' requests, but does rewrite the ones I want. I'm trying to think of more tests to try and crash it but haven't come up with much. Ideas very welcome.

>To accept either the pattern-specified string or <blank>, I would habitually use:

>RewriteRule ^([a-z0-9_/]{3,255})?$ /index.php?url=$1 [L]

I wonder which is more efficient? Yours has the advantage of being certain legal syntax, which has a lot to be said for it.

Do you know anywhere I can find out what the bold field below means? I've been trying the Apache docs but haven't found anything, it seems to vary from 1 to 4.


127.0.0.1 - - [13/May/2006:16:50:51 +1200] [example/sid#811df48][rid#8250e88/initial] [b](3)[/b] [per-dir /srv/example/example.com/] add path info postfix: /srv/example/example.com/products -> /srv/example/example.com/products/blah/blah

RedAndy

7:15 am on May 23, 2006 (gmt 0)

10+ Year Member



In case anyone else comes across this thread later - I just got bitten by the (something¦) pattern. It works on Apache 2.x but _not_ 1.x. Jim's (something)? pattern is sweet on both,

Andy

[edited by: jdMorgan at 2:00 pm (utc) on May 23, 2006]
[edit reason] Disabled smilies in code. [/edit]

jdMorgan

2:03 pm on May 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's the current rewrite/redirection count for the HTTP request, IIRC. If it gets too high and exceeds max_redirects, the server will give up and error out, since that indicates you're stuck in a loop.

Jim

RedAndy

8:17 pm on May 23, 2006 (gmt 0)

10+ Year Member



It seems to fail before that point the message in the error log is:
blah blah blah
RewriteRule: cannot compile regular expression '^([a-z0-9_/]{3,255}¦)$'\n

I think it just tastes bad to mod_rewrite in 1.x

jdMorgan

9:19 pm on May 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My post in msg #7 was intended to answer your last question in msg #5, if it wasn't clear.

I still use Apache 1.x on most of my commercially-hosted sites, and I suspect that much of the rest of the world does, too. Hosting companies and many corporations are hesitant to switch to something new, and prefer the older, perceived-as-more-stable Apache 1.x. I suspect it will take several more years to see a big move to Apache 2.x.

Jim