Forum Moderators: phranque

Message Too Old, No Replies

Need help forbidding URLs with "./"

Can't get my .htaccess code to work

         

Merganser

5:33 am on Jan 15, 2012 (gmt 0)

10+ Year Member



I am trying to block URLs with "../../../" in them. Attacks with this are showing up in my logs. For example: mywebsite.com/admin/../../something.php. I have attempted the following but it does not work:

# Block Traffic if URL contains known malicious content
RewriteCond %{REQUEST_URI} \.\./ [NC]
RewriteRule ^.*$ - [F,L]

I would greatly appreciate any hints as to what is wrong with this.

lucy24

7:41 am on Jan 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ugh. I know those. Do you have real php pages on your site? If not, there's an alternative approach to the whole thing.

Note that you don't need separate Rules and Conditions. It will run faster and cleaner with a simple, conditionless, unanchored

RewriteRule \.\./ - [F,L]

Any Rule expressed as .* (the anchors are superfluous) should be considered an absolute last resort. Put as much information into the Rule as possible, so Apache doesn't even have to look at the Condition unless there's a good chance of a match.

Now, technically you don't need to block them at all. They are asking for files that don't exist, and a 403 won't make them go away any faster than a 404. It only matters if you want to play hardball by grabbing anyone who asks for ../ and running a script that does something Truly Evil to them. But generally this only works if you have your own server.

Now the killer question: what do you mean by "it does not work"? Are they still coming through as 404 instead of your desired 403, or does something else happen?

The really bad news is that each request is an island. That is, serving a 403 will not stop someone from coming back a nanosecond later and asking for something else. Especially robots, which are programmed to go through a whole list. So if you were hoping to drive them away once and for all, it won't work. The same thing applies to blocking by IP. They'll keep hammering away, no matter how many times they run into a locked door.

g1smd

8:07 am on Jan 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule \.\. - [F] should be enough.

The accesses will still apear in the logs but bandwidth used will be much reduced.

Merganser

4:31 pm on Jan 15, 2012 (gmt 0)

10+ Year Member



Thanks for the help from both of you.

I am using an awful lot of php and am reasonably accomplished at php (but not with mod_rewrite and regular expressions).

I used the condition because I have other lines that are being banned as well. Thus my intent was to have multiple conditions followed by the one rewrite rule (I just did not display them in this post). I suppose I could have multiple rewrite rules and no conditions but was not aware of a benefit either way. If there is a benefit to multiple rewrite rules as opposed to using the conditions, please let me know and I can convert it over.

I suppose I don't understand your comment that .* should be a last resort. Could you explain, what is the anchor?

Here is an exact example from my logs:
www.mywebsite.com/?file=../../../../../../proc/self/environ%00

This results in a 200 and display of www.mywebsite.com. Thus, I would rather give them a 403/404 as opposed to displaying the page. This way, when I go through my logs and see 403/404, I can disregard and move on.

When I say that it does not work, I mean that you still get a 200 (my main page is displayed - www.mywebsite.com) and not a 403 or 404.

I have a lot of this malicious stuff going on and figure I could ban 95% of it with probably 5 conditions/rules. This is what I an trying to accomplish. The particular line of \.\./ is one that I can not get to work.

I have tried "RewriteRule \.\. - [F]" per your suggestion which also does not seem to work. So, I am still stuck.

For the record, I am using other rewrite rules successfully so Mod-rewrite is apparently enabled.

g1smd

5:30 pm on Jan 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule pattern matching can see only the requested path.

It does not see the hostname or query string data.

Query string data is not a part of the URL, it is data attached to the end of a URL.

To look at the Query String you need a preceding RewriteCond looking at QUERY_STRING.

lucy24

9:27 pm on Jan 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, whoops, query string. Do a Forums search and it will pop right up. And I would hate to think of an Apache installation that didn't have mod_rewrite enabled ;)

Here's the most fundamental thing you need to know about mod_rewrite. It goes two steps forward, one step back. That means it looks at the Rule first. Only if the Rule potentially matches does it evaluate the Conditions and see if they also match. Hence the suggested Rule \.\./ et cetera that you got in two-part harmony above. If the ../ came as part of the actual URL-- which I've seen-- that would be all you need.

But when the thing you're screening for is in the query, it can only go in the Condition. (Note, as an aside, that results from search engines also contain question marks-- but those aren't query strings, they're part of the Referer.) Even then, you can express your Rule as

RewriteRule \.php$ - [F,L]

so Apache doesn't have to stop and look at the conditions when it gets a request for things like images that will never have a query string anyway. The flag [F] carries an implied [L] but the L does no harm and it's a good habit to throw it in all the time except when you explicitly don't want it there.

If you have many different Conditions that all lead to the same rule, they need to be separated in one of two ways. You can combine both in the same group.

#1
RewriteCond blahblah [OR]
RewriteCond blahblah [OR]
RewriteCond blahblah
RewriteRule blahblah

making sure not to put [OR] at the end of your last Condition. Unless you are testing out your custom 500 page.

#2
RewriteCond (blahblah|blahblah|blahblah)
RewriteRule blahblah

So you can say

RewriteCond %{QUERY_STRING} (\.\./|bogusquery|otherbogusquery|gibberish) [OR]
RewriteCond %{QUERY_STRING} query-that's-too-long-and-complicated-to-fit-with-all-the-others [OR]
RewriteCond %{QUERY_STRING} another-long-complicated-query
RewriteRule \.php$ et cetera

The parentheses with the pipes | are not a necessary part of the Regular Expression. But again they are a good habit to ensure you're separating exactly what you want to separate. For example:

^foo|bar$

means "either begins with 'foo' OR ends with 'bar'", while

^(foo|bar)$

means "consists entirely of 'foo' OR consists entirely of 'bar'".

* * *

Add-on:
If you speak fluent php there's an alternative approach. Let your script itself look at the query string.

1. Check for queries that are ignored. Those are things that your site used to use, or that accidentally crop up in legitimate links. The script quietly deletes them.

2. Any request that now has no query at all gets the usual vanilla handling.

For the rest, there are two more steps. You can fiddle with the structure to taste.

3. Queries that are used. The script does its stuff-- but doesn't send the user to the page yet.

4. Any leftover queries are bad. At this point, the script itself issues the [F] and user gets sent nowhere except out the door with a swift kick in the programming.

Merganser

6:45 am on Jan 16, 2012 (gmt 0)

10+ Year Member



OK - Looks like my whole problem was the Query String issue. Thanks for the lengthy explanation. Got it working and I understand a bit better now.

I'll consider the .php approach.

Let me ask:

1) If I have a simple page named "mypage.php" with only text and 3 images on it, and someone requests it, will one item get evaluated against the rule (mypage.php) or will 4 items get evaluated against the rule (mypage.php + 3 images)? I have been thinking only one but, after your post and some reflection, now I am thinking all 4. Thus if I use something like:

RewriteCond %{QUERY_STRING} \.\./
RewriteRule ^.*$ - [F,L]

on a typical website, perhaps 50 items are evaluated against it on every page load? If this is how it works, I can understand the need to be conservative.

2) I also banned question marks in the query string - would that be ill advised (in reference to your comment on search engine Referrer)?

lucy24

8:16 am on Jan 16, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For question #1: Yes, exactly. In the RewriteRule itself, there is no difference between a page and all the other stuff-- not just images but stylesheets, scripts and whatnot. mod_rewrite doesn't know and doesn't care which things were typed/clicked by a human (normally the page alone) and which things were requested by the browser after it has seen the page.

So if the Rule says simply .* (again, the anchors aren't needed) then mod_rewrite has to check the Conditions for every single request that comes in.

So you start by constraining your rule to

RewriteRule \.php$ et cetera

meaning that mod_rewrite doesn't even look any further if it's a non-page. If there are things you do want to do with non-page files-- anti-hotlinking routines are a common one-- make those rules with only the extensions that apply.

2) I also banned question marks in the query string - would that be ill advised (in reference to your comment on search engine Referrer)?

Queries and referers are entirely different animals. mod_rewrite will never mistake them for each other. I don't think a query can contain an additional question mark. Don't remember seeing one. But blocking them in a RewriteCond won't make your server explode.

The "referer" in mod_rewrite is one vast mouthful, including two things that are explicitly left out of a regular RewriteRule: the domain name and the query. mod_rewrite doesn't dissect the referer into its separate parts. If there was a query-- like when you get a visitor from a search engine-- the referer string will include a literal question mark.

In fact I've got Rules myself that look at whether the visitor came from a search engine. The rough-and-dirty way is

RewriteCond %{HTTP_REFERER} \?

meaning simply "the referer string contains a literal question mark". No anchors; the ? just has to be in there somewhere. There are also rules for auto-referers: when a robot thinks (probably rightly) that it will look less obvious if it comes with a referer, so it puts the name of the requested file in the Referer slot.

And I don't use php, so I've got another rule that slams the door on almost all requests in \.php

g1smd

9:22 am on Jan 16, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With the various pattern matching options available, this is why I like using extensionless URLs for pages.

I can treat anything with an extension of any sort one way and everything else the other way (everything else being both extensionless URLs and URLs ending with a slash, i.e. the index pages).

Ahead of all that, any requests ending .htm, .html and .php are redirected to strip the extension.

URL requests with parameters are redirected to their friendly version. All requests other than www are redirected to www, etc.

Merganser

4:57 am on Jan 17, 2012 (gmt 0)

10+ Year Member



As a test I banned an individual image in one of my pages and, in-line with your explanation, the page loaded but not the image. Pretty cool. I started rewriting my .htaccess file (eliminating the anchors where appropriate :)) and ran into additional questions / problems.

1) If a page URL matches my RewriteCond, what happens to the other elements (images, etc) in the page. Obviously a 403 is displayed to the user, and so the images, stylesheets, etc. also will not be displayed but, did they get evaluated against the RewriteCond? Or, because the page did not get sent to the browser, is it that the browser was not able to follow-up and request the other elements?

If I ban a user agent, for instance:

RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]

What happens? I am guessing the initial page request gets denied and therefore, the browser is not able to follow-up and request images, etc. that were in the page? Is this what would happen? This would mean that if the initial page request gets banned, whether related to URL, IP, User agent, etc., then all items in the page would not even be evaluated because the browser did not get a chance to learn of their existance and request them. Is this correct?

2)What is the syntax to ban the following:

MySite.com/?file=../../../../../../proc/self

Notice that the ? does not follow a page name, like "file.php" and I have an implied index.php. If I want to specify the page type in the rule (per your suggestions) does the implied index.php page get taken into consideration such that the following would work:

RewriteCond %{QUERY_STRING} \.\./ [OR]
RewriteCond %{REQUEST_URI} \.\./
RewriteRule \.php - [F,L]

Or, will the \.php not match because it is not literally in the request? If the \.php does not match because it is not in the request, then I would not know how to ban the ../ without using ^* in the Rule. What are your thoughts.

Thanks again for the help from both of you.

lucy24

6:31 am on Jan 17, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Or, because the page did not get sent to the browser, is it that the browser was not able to follow-up and request the other elements?

You got it. Think of the term "user agent" literally: your browser talks to the server on your behalf. Ordinarily, a human requests a page, it is delivered to the browser, and the browser then reads the page to see what else is required. The server itself has no idea; it just hands over what the browser asks for. If the browser isn't allowed to see the page, it will never get as far as asking for all those other files.

Or, will the \.php not match because it is not literally in the request?

Interesting question. It depends on whether directory indexing happens before or after mod_rewrite. My server used to do it before, so RewriteRules only worked if I expressed them as "\.html$". Now it seems to behave differently. To cover all bases, say

(\.php|/)$

i.e. either a final .php or a naked directory. I have to say I don't understand why this also works with the top-level directory-- which can be expressed in mod_rewrite as ^$ or nothing-- but I just tried it on myself to confirm that it does.

Merganser

4:56 am on Jan 19, 2012 (gmt 0)

10+ Year Member



I guess I am not getting the same results. Basically I am wanting to ban both of the following (due to use of the term "admin"):

MyWebsite.com/admin/blahblah...
MyWebsite.com/index.php/admin/blahblah...

I don't believe I want the $ for this application but I did try with and without the $.

The .php portion seems to work fine but, the / portion does not seem to catch the version without index.php. So, I still can't figure out a way to capture both without using the ill-advised .*

Did I misunderstand or do something wrong?

lucy24

7:35 am on Jan 19, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The .php portion seems to work fine but, the / portion does not seem to catch the version without index.php.

Well, rules aren't generally made for malformed URLs ;) There simply isn't supposed to be anything after the extension-- except possibly a query string, which doesn't count.

I don't suppose your admin pages contain an awful lot of images and decorative stuff do they? If not, you can take the other route:

^([^/]+/)*admin/

That's to weed out unrelated names like "sysadmin". If you just want everyone to stay the ### out of everything with "admin" in the name, start with

RewriteRule admin - [F]

and then figure out the conditions so you're not locking yourself out. If the admin directory contains style sheets, include a referer line to exclude the exact named page(s) that call the style sheets. To let yourself in, include your IP. Or user-agent, if it's sufficiently distinctive. Or simply exclude the exact, correct name of the real page(s)-- which I assume the robot can't really get to, since they'll be password-protected in some way.

If your admin files are in a directory locked through htpasswd, you shouldn't need to do anything further in htaccess. But a quick eyeballing of your logs will make it clear why your admin files should be given a non-predictable name and nested inside folders with similarly non-predictable names. That's assuming you're allowed to rename them.


:: now back to idle speculation about what it means when blabblah/admin/index.jsp brings up a perfectly blank page ::

g1smd

7:40 am on Jan 19, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, I still can't figure out a way to capture both


You could always have two separate rules. Nothing forces you to have just one rule.

Merganser

2:41 am on Jan 20, 2012 (gmt 0)

10+ Year Member



Ok - Thanks to both of you. I think I can get by now.

Merganser

8:45 pm on Jan 21, 2012 (gmt 0)

10+ Year Member



After further trials, I could still use some assistance but, I will post as a new thread.

g1smd

10:12 pm on Jan 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IF the topic is the same or related, carry on in this one...