How secure is ([0-9a-zA-Z\-\.\'\_]+) compaired to (.*)?

Forum Moderators: phranque

Message Too Old, No Replies

How secure is ([0-9a-zA-Z\-\.\'\_]+) compaired to (.*)?

Jesse_Smith

2:14 am on Jul 31, 2005 (gmt 0)

How secure is ([0-9a-zA-Z\-\.\'\_]+) compaired to (.*)?

I've always been using (.*) because when ever I tried the longer codes, they never worked for me, until just now when I tried allowing extra characters in. For security, should I bother changing all my mod_rewrited sites from (.*) to ([0-9a-zA-Z\-\.\'\_]+)?

jdMorgan

2:50 am on Jul 31, 2005 (gmt 0)

This is mainly a concern in query string security, since query strings get passed to scripts which may have privileges beyond the normal "plain-HTML server" scope.

Remember that you can use the [NC] flag to avoid 26 of those character compares, using [a-z] with the [NC] flag is equivalent to using [A-Za-z].

There is another good reason to not use ".*" and that is that it is the greediest, most promiscuous pattern, and often causes the regular-expressions evaluation to iterate several times before finding a match.

For a simple example, take the pattern ^(.*)\.html$. A much more efficient pattern would be ^([^.]+)\.html$ because it allows a single-pass match evaluation from left to right.

Anyway, back to your original question, a compromise would be to use ".*" and ".+" type patterns on the URL part of a request, but the more restrictive ([0-9a-z.'_\-]+) and [NC] flag on query strings.

Jim

Jesse_Smith

3:25 am on Jul 31, 2005 (gmt 0)

So for this original example,

RewriteRule ^(.*)/(.*)\.html$ cgi-bin/file.cgi?Operation=something&something=$1 [L]

([^.]+) would be the best option?

([0-9a-z.'_\-]+) ([^.]+) and (.*) all work.

And [NC] instead of [L] when using ([0-9a-z.'_\-]+)?

jdMorgan

5:34 am on Jul 31, 2005 (gmt 0)

No, since the first parenthesized subpattern is passed to a script, and I presume that you are concerned that the script does not do it's own stringent parameter validation (which it should), I'd recommend making it:


RewriteRule ^([a-z0-9._\-]+)/([^.]+)\.html$ cgi-bin/file.cgi?Operation=something&something=$1 [NC,L]

[NC] simply makes the pattern-matching case-insensitive, while [L] means that if this rule matches, there is no need to process subsequent RewriteRules. A quick review of the documentation pertaining to RewriteRule flags will clarify these points.

I prefer to do parameter validation in the scripts themselves, since PERL and PHP both have better regular-expressions string-handling than mod_rewrite. I also code scripts with the approach recommended by most security experts, which is to define exactly and restrictively what the scripts will accept, rather than trying to predict all possible exploits and reject those. The latter approach is a maintenance nightmare, and leaves the script open to exploitation until you discover or are informed of an exploit and can code a solution. The 'restrictive' approach is undoubtedly what inspired the source of information that lead you to post your question.

Also, since most scripting languages have full access to the original HTTP request headers, there is no guarantee that a script, once invoked, will not directly access the HTTP REQUEST_URI parameter and extract information directly, bypassing the careful pattern-based validation you've done in your mod_rewrite rules used to pass the request to the script.

One way around that would be to use a 'filter' rule at the very top of your rules, something like:


RewriteCond %{QUERY_STRING} [^&=a-z0-9._\-] [NC,OR]
RewriteCond %{REQUEST_URI} [^/#a-z0-9._\-] [NC,OR]
RewriteCond %{THE_REQUEST} \.(php�pl)\ HTTP/ [NC]
RewriteRule .* - [F]

With this rule in place before any others, any attempt to pass a query string or URL containing restricted characters to your server would result in a 403-Forbidden response. Direct HTTP access to your scripts would also be forbidden.

Note that I just typed this code, and I may have omitted some characters from the 'allowed character' lists that your site requires to function. I erred on the side of caution in composing these lists. I also assumed that all of your published URLs are static in appearance and refer to page names which do not contain ".pl" or ".php", and that all scripts are invoked by rewriting those static URLs; In other words, this approach works if your published URLs refer to .html page names or page names without any file extensions. If these are true, then there is no way a script can be invoked without being processed through the rewriterule filter.

Jim