Forum Moderators: phranque

Message Too Old, No Replies

Can .? be used in place of .* in RewriteRule?

Another forum suggested that using .? instead of .*

         

MickeyRoush

1:44 pm on Aug 19, 2011 (gmt 0)

10+ Year Member



Another forum suggested replacing .* with .?

Quote:
Because .* is the most nasty regex there is and should be avoided at all costs


So instead of it looking like:
RewriteRule .* - [F]

Would this would be more efficient:
RewriteRule .? - [F]

I know that . means "any single character" and using * means "zero or more of". And ? means "zero or one of"

So will replacing * with ? work just as well?

Any insights would be greatly appreciated.

wilderness

2:21 pm on Aug 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Forum library [webmasterworld.com] contains a explanation of Regular Expressions [webmasterworld.com]

Regarding the "?" and "brackets or parentheses":
? matches 0 or 1 of the characters or set of characters in brackets or parentheses immediately before it. (EG a? would match the lowercase letter 'a' 0 or 1 time, (abc)? would match the phrase 'abc' 0 or 1 time, while [a-z]? would match any lowercase letter from 'a to z' 0 or 1 time.)
end of quote

I seem to recall the "." being used (on its own) in some RewriteRule's, however don't recall seeing a "?"

I believe g1smd's explanation was regarding "Rewrite Cond" and NOT the closing and commonly used same in RewriteRule

g1smd

6:48 pm on Aug 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Using ^.$ means "a single character".

Using . with no anchoring is "one or more characters".

Using .* without or ^.*$ with anchoring allows for "zero or more characters".

Using ^.?$ is "zero or one character".

Using .? with no anchoring is "zero or one or more characters".

With no anchoring there's not a lot of difference between .* and .?


There's a time and a place for the
.*
pattern; either on its own like
.*
or
(.*)

OR
on the very end of a pattern with
$
anchoring like
something(.*)$
or
(something.*)$
but only where the pattern is captured as a backreference.

Using
.*
at the beginning or in the middle of a longer pattern is always asking for trouble as it causes the parser to attempt thousands of "back off and retry" trial matches.

Using
.*
on the end of a pattern like
something.*$
where the
.*
pattern is not being captured is also redundant and the
.*$
part can simply be deleted.

lucy24

11:00 pm on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So instead of it looking like:
RewriteRule .* - [F]

Would this would be more efficient:
RewriteRule .? - [F]

In this specific situation it doesn't make any particular difference, because neither anchors nor captures are involved. Presumably the Rule is preceded by Conditions (IP, UA, Referer or whatnot) that specify who is to get the door slammed in their face, since otherwise it would mean "Don't let anyone in here ever" :)

-- which, come to think of it, might be a perfectly legitimate rule if you've got something like an admin directory that nobody including you would ever browse.

MickeyRoush

11:41 pm on Aug 21, 2011 (gmt 0)

10+ Year Member



@lucy24

In this specific situation it doesn't make any particular difference, because neither anchors nor captures are involved. Presumably the Rule is preceded by Conditions (IP, UA, Referer or whatnot) that specify who is to get the door slammed in their face, since otherwise it would mean "Don't let anyone in here ever" :)


Here is an example:

RewriteCond %{QUERY_STRING} concat[^\(]*\( [NC,OR]
RewriteCond %{QUERY_STRING} union([^s]*s)+elect [NC,OR]
RewriteCond %{QUERY_STRING} union([^a]*a)+ll([^s]*s)+elect [NC]
RewriteRule .* - [F]

Would there be any difference if the last line was changed to:

RewriteRule .? - [F]


From everyone's reply so far, I'm assuming that there's almost no difference.


Also, I appreciate everyone's input on this.

lucy24

12:35 am on Aug 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No difference at all whatsoever. But since you're looking at requests that contain query strings, you could probably save the computer a bit of backtracking by replacing it with something like

\.html$ (or \.php or whatever your extension is)

so apache doesn't waste time going back and checking for query strings when the request was for a directory. In fact, if only specific pages ever have these query strings, you might do better to constrain the rule to those pages.

And speaking of queries-- yawp! gotta double-check this-- in your Conditions, are those plus signs meant as RegEx operators or as literal plusses? If it were anywhere but a query string, the question wouldn't arise. But in queries, spaces get turned into plusses, which would have to be \+ escaped.

If they are RegEx operators they may not be doing what you intend. For example in Condition #2 the [^s]* would not prevent the computer from grabbing the word "elect"-- which contains no esses-- and then having to backtrack when it reaches the end of the query and finds there are no more esses, and it's still got to set aside an "elect". Are there potential queries containing "union" and "elect" that do not fit into either pattern, and therefore would not fail?

g1smd

1:10 am on Aug 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The example code comes from the latest version of the Joomla standard .htaccess file; authored by, err, someone in this forum.

The ([^s]*s)+ works as intended. :)

MickeyRoush

8:04 am on Aug 23, 2011 (gmt 0)

10+ Year Member



Yes g1smd, I believe it's your work.

Thanks for everyone's input.

lucy24

3:27 am on Aug 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The example code comes from the latest version of the Joomla standard .htaccess file; authored by, err, someone in this forum.


:: memo to self: This is not the RMCA. This is the forum with grownups in it ::

Oops.