Forum Moderators: phranque

Message Too Old, No Replies

403 whitelist rule breaks folders without trailing slash

WhiteList file types via rewrite rule breaks the trailing slash for folders

         

MattyUK

7:23 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



Hi

I have a rewrite rule to give a 403 response to any request for a file type not within a list of allowed extensions. A whitelist security measure.


#Allowed file extensions, if not in this group request suffers a 403
RewriteRule!(^$¦/$¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [F]

It breaks the auto appending of the trailing slash for folders. i.e.
http://www.example.com/afolder
gives a 403 rather than having the tail slash / added to it.
If the slash is added manually:
http://www.example.com/afolder/
Then the default page is served as desired.

I'm looking to resolve this.

Any ideas?

If this look familiar to anyone then they may have noticed the old post:
[webmasterworld.com...]

I had PC problems and missed the deadline to respond.

Thanks Jim. Great answer. Ever a great help.

The rule above is Jim's rule with a slight modification.

I originally embarked down a RewriteCond approach since I felt the list easier to maintain. I must have been asleep and have since discovered rewritemaps which I am sure will generate further questions when I get my hands dirty with them. I can't seem to find any good introductions to them.

Thanks in advance for any help resolving the folder issue.

Matt

[edited by: MattyUK at 7:26 pm (utc) on Nov. 24, 2006]

jdMorgan

7:36 pm on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try using a negative character-group pattern like this:

RewriteRule !([b]^[^.]*[/b]¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [F]

Loosely translated, the positive-logic part reads, "any URL-path without a filetype OR one of the following filetypes." That replaces your patterns for "/" and "" as well.

Jim

MattyUK

7:43 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



Thanks Jim, I can confirm that works perfectly. I doubt I would have thought of it either. Thank you.

MattyUK

8:14 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



Humm, I'm sorry I ran away happy too soon.

So using this rule:


RewriteRule!(^[^.]*¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [NC,F]

Would we expect this url to give a 403:
http://www.example.com/index.403

Where at the moment it gives a 404.

Is [^.]* too greedy perhaps? It means "where you don't match anything zero or more times" right? So in context thats match the rule when you don't match anything zero or more times or don't match the following extensions. Opps I think I confused myself...

I am using custom error pages within the htaccess below this rule.

I'm using these urls to test:
#200 #http://www.example.com/afolder
#200 #http://www.example.com/afolder/
#200 #http://www.example.com/afolder/index.html
#200 #http://www.example.com/index.html
#200 #http://www.example.com/index.html?testvar=test
#404 #http://www.example.com/index
#404or403? #http://www.example.com/index.
#403 #http://www.example.com/index.403
#403 #http://www.example.com/index.403?testvar=test
#403 #http://www.example.com/index.html.403
#403 #http://www.example.com/index.html.403?testvar=test
#403 #http://www.example.com/index.403?testvar=test.html

With this physical structure:
/index.html
/afolder/index.html

Thanks again for any attention to this.

Matt

[edited by: MattyUK at 8:15 pm (utc) on Nov. 24, 2006]

jdMorgan

8:53 pm on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "^[^.]*$" subpattern means, "allow requested URLs which are composed of zero or more characters but no dots/periods/full stops."

So that pattern allows requests for "/" or "/index", but won't allow any URL with a dot in it.

If the requested URL *does* contain a dot, then it will have to match one of your whitelisted filetypes to be allowed.

Did you flush your browser cache before testing?

Jim

MattyUK

8:54 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



My latest best guess is:

RewriteRule!(^[^.]*/?$¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)(\?.*)?$) - [NC,F]

Which seems to work. but I feel it should fail on:
http://www.example.com/index.403?testvar=test.html
giving a 404 rather than 403. The final test.html being what should let it off when it shouldn't.

I'd greatly appreciate any insight/explanation.

Matt

MattyUK

9:10 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



Hi Jim

I did flush yes. Always worth asking though. I can see the problem. A typo I think. Your earlier post missed the $ from ^[^.]*$

Otherwise it would have been 100% first time around.

For others reading this:
This rewrite rule operates as a security whitelist rule that prohibits requests for filetypes not in the regexp by delivering a 403 response.


RewriteRule!(^[^.]*$¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [NC,F]

Largely, nay, pretty much all thanks to Jim.

Also our posts crossed. I posted before I saw your last response. Sorry for any confusion.

Thanks Jim.

Matt

[edited by: MattyUK at 9:13 pm (utc) on Nov. 24, 2006]

jdMorgan

9:17 pm on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I showed the additional "$" for clarity only. In the full regex, the final "$" within the outer parentheses would apply, making it unnecessary to add another one.

Jim

MattyUK

9:32 pm on Nov 24, 2006 (gmt 0)

10+ Year Member



oo that's odd. I mean, it does make sense but doesn't match up with my tests.


RewriteRule!(^[^.]*$¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [NC,F]

will give a 403 when faced with:
http://www.example.com/index.403

Where:


RewriteRule!(^[^.]*¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js)$) - [NC,F]

will give a 404 when faced with:
http://www.example.com/index.403

What you've said makes sense so I can't grasp why the above occurs.

I just retested emptying cache each step with the same results. I am using custom error pages that state 404 or 403 and it is that readout I am using to see if it is a 404 or 403.
This being the code and situated below the rule we're working on.


# CUSTOM ERROR PAGES - Start
# 401 Unauthorized
ErrorDocument 401 /error401.html
# 403 Forbidden
ErrorDocument 403 /error403.html
# 404 Not Found
ErrorDocument 404 /error404.html
# 500 Internal Server Error
ErrorDocument 500 /error500.html

I don't think the custom error documents would interfere.

Would we need extra brackets for the $ at the end to be effective in that manner?


RewriteRule!(^([^.]*¦\.(php5?¦html?¦txt¦css¦pdf¦zip¦gz¦jpe?g¦gif¦png¦bmp¦ico¦js))$) - [NC,F]

The above rule is untested. I'm just guessing.

I'm happy since I've got the rule that works (with the extra $) but I'd be interested in the explanation of the behavior above if you work it out. Could it be particular to my server environment?

Matt

[edited by: MattyUK at 9:36 pm (utc) on Nov. 24, 2006]

jdMorgan

11:10 pm on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Probably just some parentheses-parsing-precedence weird thing... Go with what works.

Jim

MattyUK

12:38 am on Nov 25, 2006 (gmt 0)

10+ Year Member



Thanks again. I suspect it is a rule that others will find useful as well. Couldn't have got there without you and I advanced on the way. Thanks.