Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite in httpd.conf file

Filtering spurious strings from URLs and queries

         

umakanth

8:18 pm on Jun 8, 2005 (gmt 0)

10+ Year Member



I have added mod_rewrite on in httpd.conf file and have rewrite rules to filter out certain characters and words like '<' '<SCRIPT>' will be replaced with ''

The rewrite rules work on the url but the parameter list and the values do not get rewriteen.

ex:

[myse<script>rver.com...]

will change to

[myserver.com...]

& I want it to work like this...

[myserver.com...]

Any Ideas?

Thanks

jdMorgan

8:28 pm on Jun 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



umakanth,

Welcome to WebmasterWorld!

Please post your RewriteRule(s) so we can discuss them.

Jim

umakanth

8:45 pm on Jun 8, 2005 (gmt 0)

10+ Year Member



Jim,

Thanks, these are the rules...

<IfModule mod_rewrite.c>
RewriteEngine ON
RewriteLog rewrite.log
RewriteLogLevel 9
RewriteCond %{HTTP_HOST} ^.*$ [NC]
RewriteRule ^/(.*)<SCRIPT>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<Script>(.*)$ /$1$2 [N]
RewriteCond %{QUERY_STRING} ^.*$[NC]
RewriteRule ^/(.*)<SCRIPT>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<Script>(.*)$ /$1$2 [PT]
</IfModule>

Umakanth

jdMorgan

8:53 pm on Jun 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmm...

I guess I don't understand your question. The first and second group of rules appear to be redundant, since any value will be accepted for HTTP_HOST and QUERY_STRING. Also, a RewriteCond only applies to the single RewriteRule that follows it, so I do not understand what your code is intended to accomplish.

Can you ask a more specific question related to your code, and provide more examples of the requested URLs and the desired output URLs?

Jim

umakanth

9:10 pm on Jun 8, 2005 (gmt 0)

10+ Year Member



Yes it is reduandant, I was trying out options commenting code ... and it got all of it pasted..

This is what I had initially in the httpd.conf file.

<IfModule mod_rewrite.c>
RewriteEngine ON
RewriteLog rewrite.log
RewriteLogLevel 9
RewriteCond %{HTTP_HOST} ^.*$ [NC]
RewriteRule ^/(.*)<SCRIPT>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [PT]
</IfModule>

Currently these rules are working on part of the url

i.e. [myse<script>rver.com?p_id=8218...]
rewriteen to [myserver.com?p_id=8218...]

What is not working is if the url is

[myserver.com?p_i<script>d=8218...]

It should change to

[myserver.com?p_id=8218...]

Let me know if you have questions.

Thanks
Umakanth

jdMorgan

9:29 pm on Jun 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The RewriteCond is unnecessary, because it does nothing. So delete it. That leaves this:

RewriteRule ^/(.*)<SCRIPT>(.*)$ /$1$2 [PT]
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [PT]

But that's redundant --differing only in case-- so change that to just one line:

RewriteRule ^/(.*)<script>(.*)$ /$1$2 [[b]NC,[/b]PT]

However, RewriteRule cannot "see" a query string, so it must be handled separately, in much the same way as you've handled the URL:

RewriteCond %{QUERY_STRING} ^(.*)<script>(.*)$ [NC]
RewruteRule ^/(.*)$ /$1?%1%2 [PT]

If "<" is always followed by "script>", then you may substantially increase your server's performance by using a pattern like ([^<]*)script>+(.*) instead of (.*)<script>(.*) because this pattern eliminates the ambiguity of using two .* subpatterns in the same pattern and avoids multiple trial matches.

Also, if you need to eliminate these corrupted URLs from search engine listings, then generate an external 301 redirect, instead of immediately invoking the content-handler.

Jim

umakanth

9:45 pm on Jun 8, 2005 (gmt 0)

10+ Year Member



Jim, Awesome! it works!

Thank you very much.

One question, if we have multiple occurances of <script> in the url it does not work, is there an
additional flag I need to give in the rewrite rule.

jdMorgan

9:55 pm on Jun 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This gets ugly. The options are to "stack" multiple instances of the rules, with [PT] removed from all but the last instance, or to use the [N] flag to restart mod_rewrite processing from the beginning. This can be really slow, especially if these rewriterules are not at the very top of the file.

There's also the case where both the URL and the query string contain "<script>", and this case is not handled by the code I posted, either. How you fix that depends on whether you want to use [N] or not. And that depends on how many times per hour/day you get a request for these malformed URLs.

There are dozens of ways to solve the problem, but the most efficient way depends on whether performance is an issue -- I mean, you can have performance efficiency or code simplicity.

Jim

umakanth

10:03 pm on Jun 8, 2005 (gmt 0)

10+ Year Member



Thank you very much for helping me out

My requirements is to filter out a list of characters/words from incoming url and we use apache web server.

Is there an alternate soluction than mod_rewrite to acheive this, maybe block such url's or a standard
procedure/function that can process each incoming urls

Thanks again
Umakanth

jdMorgan

2:14 am on Jun 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are many solutions and many ways to implement those solutions, but you must decide precisely what you want to do:

Block them -- Return 403-Forbidden response.
Rewrite them -- Return alternate content page(s).
Redirect them -- Redirect to a different URL or URLs.
Mask them -- Internally rewrite them to the correct local URL-path, pretending there was no error in the URL.
Correct them -- Externally redirect to the correct URL (helps to correct search engine listing URLs).

All of these solutions could be implemented with mod_rewrite, but mod_rewrite is somewhat limited in handling multiple corrections per requested URL. If all of your content is generated by a php or CGI script, then perhaps you could do the corrections in the script more easily.

Jim

umakanth

1:37 pm on Jun 9, 2005 (gmt 0)

10+ Year Member



Thanks for the options, I was going with last option but only constraint is that all of our pages are generated by pl/sql code in oracle and to implement this change we need to change each and every module to call a genereic code to strip or validate and send error message.

I am trying to avoid going thru that big a change and trying to implement something on the server. Is there an alternative to mod_rewrite on apache server OR is there a greater performance issue if we implement mod_rewrite to look for multiple instances of these characters?

Thanks
Umakanth

jdMorgan

4:02 pm on Jun 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Is there an alternative to mod_rewrite on apache server?

Other than using a script? - No, I don't think so.

> OR is there a greater performance issue if we implement mod_rewrite to look for multiple instances of these characters?

The performance penalty is (number of URL corrections needed per unit time).

I'm afraid that I can't offer much more advice except to say that it is time to code this up and test it. I could predict success or failure, but I do not know how many bad URL requests you get per second, how much overall traffic your site gets, how fast your server and backbone connection are, or any of the other hundreds of factors that will affect the outcome. So, regardless of what I predict, there is only a 50% chance I'll be right. It's time for testing.

Hopefully, you know how to eliminate the source of these bad URLs. In that case, this problem and any related performance penalties will be limited to a few months' time. In the meantime, all you can do is implement a fix and try to performance-optimize it to minimize the interim impact.

The more precise your regex patterns, the faster the code can be processed. As noted above, avoid the use of ambiguous ^(.*)xyz(.*)$ -type patterns if at all possible.

Jim

umakanth

7:45 pm on Jun 9, 2005 (gmt 0)

10+ Year Member



Thanks for the reply. I am trying to use 'stack' option to filter multiple instances but if I do not give PT, it does not work and if I give PT it works for the first one only, anything I am missing.

Following is the code

RewriteEngine ON
RewriteLog D:\Apache\logs\rewrite.txt
RewriteLogLevel 1
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [NC]
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [NC,PT]
RewriteCond %{QUERY_STRING} ^(.*)<script>(.*)$ [NC]
RewriteRule ^/(.*)$ /$1?%1%2 [PT]

Thanks
Umakanth

jdMorgan

9:05 pm on Jun 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How many times can <script> appear in a URL? Twice, three times, ten? Please be as specific as possible about your problem.

Jim

umakanth

9:19 pm on Jun 9, 2005 (gmt 0)

10+ Year Member



Jim,

We do not know how many times it could occur, currently I am trying to implement for 3 times. Assuming that would be max. number if could occur.

Thanks
Umakanth

jdMorgan

9:55 pm on Jun 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you can try this. I am *not* sure it will work.

RewriteEngine ON
RewriteLog D:\Apache\logs\rewrite.txt
RewriteLogLevel 1
# Strip URLs
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]
# Strip query strings
RewriteCond %{QUERY_STRING} ^(.*)<script>(.*)$ [NC]
RewriteRule ^/(.*)$ /$1?%1%2 [E=Stripped:yes,N]
# Pass through stripped requests
RewriteCond %{ENV:Stripped} ^yes$
RewriteRule .* - [PT,L]

If <script> is found it is removed. The variable "Stripped" is set to "yes", and then mod_rewrite parsing is restarted from the beginning. If no more <script>s are found, then the variable "Stripped" is tested. If it has been set by a previous pass, then the current (correct) URL is rewritten to itself, and the PT flag is set.

I *am* sure it will be slow if you receive a lot of these requests.

Jim

umakanth

3:12 pm on Jun 10, 2005 (gmt 0)

10+ Year Member



Thank you Jim, that works wonderfully...

Last question if we need to add more filter characters then can I add it like this, sorry new to regular expressions maybe a silly question?

<IfModule mod_rewrite.c>
RewriteEngine ON
RewriteLog D:\Apache\logs\rewrite.txt
RewriteLogLevel 1
# Strip URLs
RewriteRule ^/(.*)<script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]
RewriteRule ^/(.*)<\/script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]
# Strip query strings
RewriteCond %{QUERY_STRING} ^(.*)<script>(.*)$ [NC]
RewriteRule ^/(.*)$ /$1?%1%2 [E=Stripped:yes,N]
# Pass through stripped requests
RewriteCond %{ENV:Stripped} ^yes$
RewriteRule .* - [PT,L]
</IfModule>

jdMorgan

4:40 pm on Jun 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) There is no need to escape the "/" character in mod_rewrite.
2) You can cover both cases with one rule:

RewriteRule ^/(.*)[b]</?s[/b]cript>(.*)$ /$1$2 [NC,E=Stripped:yes,N]

You will need to make the same change to the query string test.

And again, if all filtered strings start with "<", then use a more specific pattern:


RewriteRule ^/([^<]*)</?script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]

This will run much faster.

Jim

umakanth

5:51 pm on Jun 10, 2005 (gmt 0)

10+ Year Member



Thanks Jim, but for some reason this condition does not work.

RewriteRule ^/([^<]*)</?script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]

and what is interesting is the below condition works for querystring but not for the main url

RewriteRule ^/(.*)</script>(.*)$ /$1$2 [NC,E=Stripped:yes,N]

I tried with "/?" also initially.

Would you suggest a good book or a reference link on the web to get familier with regular expressions on apache, you have been really helpful but I need to get a hang of it too.

Thanks again
Umakanth

moltar

6:00 pm on Jun 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check out mod_security. I think it's more suitable for this job.

jdMorgan

6:14 pm on Jun 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> or a reference link on the web

Sure, see our forum charter [webmasterworld.com], or search for "regular expressions tutorial" or "regex tutorial".

Jim

umakanth

6:30 pm on Jun 10, 2005 (gmt 0)

10+ Year Member



Jim, thanks for that link on the tutorial.

is mod_security another module in apache or do we need
to install it seperatly, and we would write the regular expressions for mod_security instead of mod_rewrite correct?

umakanth

8:54 pm on Jun 10, 2005 (gmt 0)

10+ Year Member



Jim,

I have additional requests, now we are required to implement this filtering stuff on post requests, does mod_rewrite work on form post values.

Any idea?

Thanks
Umakanth

jdMorgan

4:36 am on Jun 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, POST data is carried in the body of an HTTP message (as opposed to the header) and is not visible to mod_rewrite. That kind of filtering has to be done in the script that accepts the data.

Remember, to avoid insanity, decide what you will *accept* in posted data and filter everything else; Don't try to think up every possible thing that you might want to reject. The latter approach can be a maintenance nightmare, and can lead to madness.

Jim

umakanth

5:01 am on Jun 12, 2005 (gmt 0)

10+ Year Member



Jim

"That kind of filtering has to be done in the script that accepts the data."

Any ideas how they do it on apache server. We use oracle app server which is based on apache http server.

Thanks
Umakanth

umakanth

6:05 pm on Jun 16, 2005 (gmt 0)

10+ Year Member



Thank you Jim for all your help, I was able to use mod_security and implement all the filtering rules we need for url strings as well as form post values.

Installation on windows apache server was a little tricky because of the compiler and header files but the module itself works wonderfully.

Keep up the good work.

Regards
Umakanth