Forum Moderators: phranque

Message Too Old, No Replies

When to use which anchor

in RewriteRule and RewriteCond

         

Peter

10:39 pm on Jul 21, 2007 (gmt 0)

10+ Year Member


Hello,

I think (.*) is said to be "greedy". When using it to collect the remaining part of a string with RewriteCond or RewriteRule, such as:

RewriteCond %{REMOTE_ADDR} ^74\.6\.(.*)$
# then use %1 for something
or:
RewriteRule ^(.*)\.htm$ http:/[smilestopper]/www.example.net$1.html? [R=301,L]

should one put both anchors (as above), or is execution faster if only the "known" end is anchored, as follows?

^74\.6\.(.*)
(.*)\.htm$

I think my real question is, how does the routine decide whether to start comparing from the left or from the right, and can one be sure that (.*) will always return everything up to the unanchored (beginning or) end of the line?

Thanks.
Peter.

jdMorgan

1:28 am on Jul 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Putting an anchor adjacent to (.*) is not necessary, but it probably helps the mod_rewrite *parser* figure out that the pattern starts or ends there. You can be sure that (.*) and (.+) will match everything they can possibly match.

Because these patterns are greedy, I tend to avoid their use at the start of a pattern when doing so would invoke a back-off-and-retry.

So instead of:


(.*)\.htm$

I'd tend to use:

^([^.]+)\.html$

or, if multiple periods might actually occur in the URL-path:

^(([^.]+)\.)+html$

This avoids having the ".*" initially consume the entire string, and then have to back off one characters at a time through "l", "m", "t", "h", and "." to get a match

Jim

Peter

8:53 pm on Jul 22, 2007 (gmt 0)

10+ Year Member



Thank you Jim.

I've tried to apply your principles to the problem of the "double slash" (or multiple slashes in one place) in the URL, which I check for every access with
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
- which must be absurdly expensive (and rarely useful).

What I've come up with is:
RewriteCond %{REQUEST_URI} ^/((([^/]+)/)*)(/+)(.*)
RewriteRule .* htt p: //www.example.net/%1%5? [R=301,L]

This seems to work, but I'm wondering whether it would be better to add a precondition:
RewriteCond %{REQUEST_URI} //
RewriteCond %{REQUEST_URI} ^/((([^/]+)/)*)(/+)(.*)
RewriteRule .* htt p: //www.example.net/%1%5? [R=301,L]

Do you think I'm right in supposing that the first RewriteCond will execute (and almost always fail) much more quickly than the second?

Regards,
Peter.

g1smd

10:03 pm on Jul 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In fact,
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$

is pretty much the same as
RewriteCond %{REQUEST_URI} //

I think the advice woud be to not bother.

I'll guess that jdMorgan will have the right answer.

jdMorgan

10:15 pm on Jul 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, that's the one case where it's probably more efficient to just use "(.*)//+(.*)$ and be done with it.
In .htaccess, you could pre-condition it --at least a bit-- by using the RewriteRule pattern:

RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / %1/%2 [L]

Since any URL-path that contains more than one "/" must contain at least one "/", and since RewriteRule patterns must match before RewriteConds are evaluated (see mod_Rewrite docs), this at least prevents the whole rule-set from running if there aren't any slashes at all...

And of course, in httpd.conf or conf.d, you can do it without a RewriteCond since the URLs are not localized to a directory and will begine with a slash:


RewriteRule ^(.*)//+(.*)$ $1/$2 [L]

The only way to find out which methods are most efficient is to actually test them. Otherwise, you're just taking someone's word for it. And there exists the possibility that the results may change depending on which OS you test under, and which version of what regex library came with that OS.

The one thing you want to avoid, though, is patterns with multiple ambiguous and greedy sub-patterns. They force multiple back-off-and-retry steps, can cause the execution time to grow geometrically, and are especially inefficient if the string to be matched against the pattern has a very long 'tail'.

Jim

Peter

10:25 pm on Jul 22, 2007 (gmt 0)

10+ Year Member



Thank you.
Peter.