Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite - How to specify index page

Syntax needed to indicate index page only

         

Merganser

9:05 pm on Jan 21, 2012 (gmt 0)

10+ Year Member



I am wanting to list a number of rewrite conditions (RewriteCond) and have them apply to only one rewrite rule (RewriteRule). My problem is with the RewriteRule. I would like for the rewrite rule to apply to all pages ending in .php but also apply to the main (default) website page when only MyWebsite.com is entered as the URL.

Currently, I have it working for all .php pages using:

RewriteRule \.php - [F,L]

However, if someone enters "MyWebsite.com/" as the URL, then my (default) index.php file is loaded but this does not satisfy the RewriteRule (because "index.php" literally is not listed in the URL). Thus, I believe I need a RewriteRule of the form:

RewriteRule (xyz|\.php) - [F,L]

to capture both scenarios. However, I have not been able to determine what to replace xyz with in order to specifically specify the index.php page when it is not literally listed in the URL.

Something like .* works but I would like to limit all the rewrite conditions to only the default page and .php pages.

Any help or suggestions would be appreciated.

lucy24

9:37 pm on Jan 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your domain name will always end in / with final slash. Unlike other Directory Slash Redirects, which are done by the server, this one is done by the browser. So a bare domain name would come through as ^$

What about other directories' index pages? Those too might come through as blahblah/ alone, in which case the Rule would have to include ^([^/]+/)* if you need to include them.

g1smd

10:08 pm on Jan 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Irrespecive of what the files on the server are called, mod_rewrite works with URL requests.

The RegEx patterns will need to match incoming URL requests, where these are localised on a per-directory basis, i.e. the leading slash of the path is stripped before being presented to mod_rewrite for processing.

Merganser

10:38 pm on Jan 21, 2012 (gmt 0)

10+ Year Member



OK - I guess I am realizing my rationale is fundamentally flawed. Here are a few examples of things I want to block (from my log):

"GET /w00tw00t.at.ISC.SANS.DFind:) HTTP/1.1"
"GET /).andSelf(),F=0;E.find( HTTP/1.0"
"GET /),K=document.createElement( HTTP/1.0"

I guess I see that if I use ^$, it does not match because even though a specific page is not specified, these do not resolve to a blank directory either.

I wanted to block each of these with RewriteCond statements followed by one RewriteRule statement. My reason for structuring it this way is really only for convenient grouping and aesthetic appearance wtihin the .htaccess file (and so it is easier for me to interpret when I return to it 3 months later).

Obviously, the \.php does not match and neither does the ^$. There is no commonality to key off of for use in one RewriteRule so I am thinking now that it can only be accomplished with 3 separate RewriteRules. Something like:

RewriteRule w00tw00t - [F,L]
RewriteRule \.andSelf - [F,L]
RewriteRule createElement - [F,L]

Does this sound correct?

lucy24

2:11 am on Jan 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Holy ###. Are those actual requests? With parentheses and everything?

If your real filenames are nice-- which, ahem, of course they are-- you can get rid of most of those bogus requests with a simple and elegant

RewriteCond %{REQUEST_URI} [^a-z/._-] [OR]
RewriteCond %{REQUEST_URI} (\.\.|//|--|__)
RewriteRule (\.php|/)$ - [F]

Meaning: if the request contains anything other than a lower-case letter, hyphen, lowline (leave out one or the other if you don't use them) dot or directory slash, OR it contains two of something there would never be two of ... stomp on it. No anchors; the character(s) only have to occur somewhere in the request.

Can't remember if mod_rewrite can handle standard lookaheads. If so, you can add an [OR] condition that says

\.(?!php$)

Remember that requests for nonexistent files are not really a problem unless your custom 404 page is much much bigger than your 403 page. It's just, hm, the principle of the thing isn't it ;)

Now, if you are brave enough to use ! in the rule itself you can make a second Rule that includes the element

!(/|\.php|\.png)$ et cetera, listing your real-life extensions.

In mod_rewrite, the NOT character ('!') is also available as a possible pattern prefix. This enables you to negate a pattern; to say, for instance: "if the current URL does NOT match this pattern". This can be used for exceptional cases, where it is easier to match the negative pattern, or as a last default rule.

That's under Rule, not Cond. They left out the boilerplate about "do this with extreme caution".

Merganser

3:27 am on Jan 22, 2012 (gmt 0)

10+ Year Member



These are actual requests - parentheses and everything, I copied/pasted from my log.

I understand the rewrite conditions and they do work. However, in order to get them to work I had to change the rewrite rule to:

RewriteRule .* - [F]

These requests do not include ".php" in them at all and they do not end with a "/". So, I am guessing the rewrite rule evaluated to be false and the conditions were therefore never checked.

I think this is the heart of my problem. I can't seem to ban these without using ".*" in the rule (which I want to avoid doing). Hence why I am coming around to the idea that it can only be accomplished with multiple rules (each containing something similar to the conditions you recommended).

I have to admit I don't understand your reference to "lookaheads" and use of the code \.(?!php$). I can't even seem to interpret it. Primarily confused by the ? and !.

I also would rather not use !.

lucy24

6:07 am on Jan 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A lookahead or lookbehind-- those are the formal technical terms-- is a nifty feature of Regular Expressions. It means "this character or group of characters is (or is not) followed (or preceded) by such-and-such". The form

\.(?!php)

means "a literal period that is not immediately followed by the string "php". Notice how this is not the same thing as the familiar

!\.php

which simply means "does not contain the whole period-plus-php sequence". And it's much less clunky than

\.([^p]|p[^h]|ph[^p])

The syntax is:

stuff(?=blahblah)
positive lookahead "the specified stuff is followed by blahblah"
stuff(?!blahblah)
negative lookahead "...is not followed by..."
(?<=blahblah)stuff
positive lookbehind "...is preceded by..."
(?<!blahblah)stuff
negative lookbehind "...is not preceded by..."

Useful huh? The blahblah can be replaced by pipe-separated alternatives, or by bracketed groups. Unlike the normal parenthesis syntax, in a lookahead or lookbehind (?=foo|bar) means "followed by 'foo' OR followed by 'bar'". You don't repeat the leading ? part.

Detour here: I realized that before I blather on any further about such a useful feature, I had better test whether it actually works in mod_rewrite. It does! But maybe you should practice in a text editor or something similarly harmless before cutting loose in htaccess.

So you want to deal with requests that can have absolutely any form. Take the Condition from before and let's dump it into the Rule itself.

RewriteRule [^a-z_./-] - [F,L]

That was the easy one :) Anything containing bad characters anywhere is summarily dumped, whether it ends in .php or not. Again, I detoured to test this to make sure requests don't contain invisible "end of transmission" characters that would make the Rule fail every time.

How many of your unwanted requests does that take care of?

g1smd

8:11 am on Jan 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Please can we stop this [F,L] business?

It's [F] only for this. Likewise for [G].

Mod_rewrite procerssing stops when these are served.

lucy24

9:24 am on Jan 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also [P] and [PT] :-P

My own htaccess just has [F]. But I'm thinking it's better to include a superfluous [L] than to omit a necessary one. Especially with redirects, where it seems as if [R] would have to imply [L]-- but it doesn't.

Merganser

6:46 pm on Jan 22, 2012 (gmt 0)

10+ Year Member



Ok - Thanks guys. I understand the lookahead/lookbehind. With 6 or 7 statements I think I have banned 2k+ bogus querys which occurred last month. And, I think I have accomplished this with minimal evaluations in .htaccess.

For the record, I also changed my [F,L]s to [F]s.