Forum Moderators: phranque

Message Too Old, No Replies

RewriteCond & RewriteRule

Please provide explanations for a set of conditions and rewrite rules

         

jason1989

7:03 pm on May 22, 2008 (gmt 0)

10+ Year Member



Dear Forum Members:

Please help me understand something that is driving me mad. But first I believe that I have a pretty good understand what a RewriteCondition like this does:

RewriteRule ^(.*).htm$ $1.php

Let me explain what I believe it does. Because of the ^ symbol, it begins at the very beginning and stores everything up to but not including .htm

The first $ sign indicates that nothing follows .htm and it ends there.

The second $ sign indicates that we take the backreferenced data (all characters matched within (.*) and the second $ sign indicates that the backreferenced data is put there with .php as the new extension.

So for instance, if we have:

index.htm it will be rewritten to index.php

This is what I just cannot figure out. I am totally dumbfounded by the following set of conditions/rules do.

#############################
RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteRule (.*) index.php
#############################

Could somebody please take the time to carefully explain this to me?

Thank you

Marcia

9:08 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let me explain what I believe it does. Because of the ^ symbol, it begins at the very beginning and stores everything up to but not including .htm

The first $ sign indicates that nothing follows .htm and it ends there.


If the first $ is the end of the string being matched in the expression, why do you think that .htm isn't included? Why? Where does it say that?

If the $ in the second half of the expression is being used to match what's in the (parenthesis) that precedes, in the first half of the expression, for purposes of matching (like a variable), why do you think that the .htm will be ignored?

AFAIK, it won't be: ((something).htm) won't be ignored, but if it's ((something).html) it will be ignored because it isn't a match - or the converse, which is exactly something I did just yesterday.

[edited by: Marcia at 9:18 pm (utc) on May 22, 2008]

Receptional Andy

9:16 pm on May 22, 2008 (gmt 0)



If the first $ is the end of the string being matched in the expression, why do you think that .htm isn't included? Why? Where does it say that?

I'm not the best at mod_rewrite or regex, so corrections are welcome!

The htm is matched, but as it isn't within brackets it isn't passed along to $1. So, the .htm is essentially dropped from the rewriterule.

RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]

This matches all of the named file extensions, and extensionless URLs in the requested URL - the bar character essentially means OR.


RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

These two conditions are to avoid conflicts - the first checks to see if the request matches a file that actually exists, the second a directory.

RewriteRule (.*) index.php

Everything matching these conditions is internally rewritten to index.php, which presumably then analyses the requested URL to determine what content to display. It's a way of creating 'virtual' filenames and folders via a single PHP file.

As above - any corrections gratefully accepted.

[edited by: Receptional_Andy at 9:18 pm (utc) on May 22, 2008]

Marcia

9:32 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a working example from just yesterday, quick and simple:

RewriteEngine on
RewriteRule ^/directory/(.*).html$ http://www.example.com/directory/script.php?file=$1 [QSA]

If you type /directory/foo.html into the browser address bar, you will get whatever is equivalent to foo.html in the script. The $1 will backreference whatever (.*) is, with the indicated file extension appended. However, if you type /directory/foo.htm into the browser address bar, you will get a "404 Page not Found", since the script is generating (matching) .html pages, not .htm pages - so foo.htm is not a match. So how could it be, if the expression is ignoring the file extension?

The first part of the expression - what's to be matched - ends with the first $ at the end of the first "half" of the expression. If not, where does it say otherwise, that something before it would be ignored, if it wasn't excluded by a ! preceding?

What that's saying is that if your browser requests anything.html then the system will deliver to your browser whatever is represented by script.php?file=anything with that .html extension appended, because that is exactly what the first part of that expression told the system to match, using the (.*) as a wild card back reference for the $1 at the end of the second half of the expression. The first half is declaring a variable - (.*) - and a constant - .html so when the two are put together, they give the user agent the right file.

[Reminder to self: Keep repeating to self: URIs are not the same thing as files.]

[edited by: Marcia at 9:56 pm (utc) on May 22, 2008]

Receptional Andy

9:45 pm on May 22, 2008 (gmt 0)



Marcia: I'm not sure if that's a clarification or a question ;)

The regex

^/directory/(.*).html$
matches
directory/foo.html
but not
directory/foo
. However only the part within brackets
(.*)
is captured and stored in
$1
.

So in the case of

/directory/foo
the rule is never triggered since there is no match. When there is a match, only the
(.*)
part (now stored in
$1
) is passed to the script.

My regex/mod_rewrite terminology is not great, granted ;)

jdMorgan

9:54 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The confusion here is simply between "what is included in the match?" versus "what is included in the back-reference?" It looks like everybody is right, but the argument is about which question was asked...

Jim

Receptional Andy

9:58 pm on May 22, 2008 (gmt 0)



back-reference

That's the word I was looking for! :)

jason1989

10:08 pm on May 22, 2008 (gmt 0)

10+ Year Member



Thanks to everyone who responded. I do think that I should have been more to the point with my original question.

Take the following:

#############################
RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]

RewriteCond %{REQUEST_FILENAME} !-f

RewriteCond %{REQUEST_FILENAME} !-d

RewriteRule (.*) index.php
#############################

To me, the RewriteRule looks as if it will simply append index.php to the REQUEST_URI.

So, for instance, if we analyze this URL:

[127.0.0.1...]

My (wild) guess is that the result will be:

/sports/weblinks.htmlindex.php

Can someone tell me what the result would be?

Receptional Andy

10:11 pm on May 22, 2008 (gmt 0)



The rule is an internal rewrite - not a redirect, so the requested URL will not change. What does change is what happens behind the scenes.

If the file /sports/weblinks.html exists, then it will displayed as normal. Otherwise, the file /index.php will displayed. Within index.php there could be code that checks what was requested, and displays different content - hence /sports/weblinks.html could show something different to the user than /sports/news.html - even though index.php is the file that is called in both cases.

Marcia

10:19 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jason, see if Jim's explanation in this thread about the difference between an external redirect and an internal rewrite clarifies anything:

[webmasterworld.com...]

With one, the file extension visibly changes in the browser (external redirect), but with the other (internal rewrite), it doesn't - it just delivers the file that's matched to the requested URL in the regex and the file requested stays the same, including the file extension. That's probably the hardest concept to labor through.

If the file /sports/weblinks.html exists, then it will displayed as normal. Otherwise, the file /index.php will displayed. Within index.php there could be code that checks what was requested, and displays different content - hence /sports/weblinks.html could show something different to the user than /sports/news.html - even though index.php is the file that is called in both cases.

I'll have to figure that out related to putting in some error control (for mis-haps or goof-ups) in a particular instance.

[edited by: Marcia at 10:24 pm (utc) on May 22, 2008]

jason1989

10:21 pm on May 22, 2008 (gmt 0)

10+ Year Member



Hi Andy,

Thank you for responding. I am aware that the rewrite takes place behind the scenes and is not a redirect.

What I want to know is what the RewriteRule produces behind the scenes. My guess is:

/sports/weblinks.htmlindex.php

but it is just my guess. Can you tell me what is being rewritten?

Thanks again

Marcia

10:34 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I want to know is what the RewriteRule produces behind the scenes.

The RewriteRule doesn't *produce* anything; it just tells Apache that if abc is requested by the user agent, and the Rewrite Rule has said to match abc to xyz, then Apache is to deliver xyz to that user agent using an internal rewrite. The user agent will get exactly the URL requested, which will be matched and served up with the equivalent file that Apache has been told to deliver when that URL is requested.

Lets say that the RewriteRule says to deliver xyz.anything to the browser when abc.html is typed in. Then, the xyz(.anything) file will be displayed to the browser when abc.html is requested, and abc.html will be what's shown as the URL.

[edited by: Marcia at 10:38 pm (utc) on May 22, 2008]

jason1989

10:35 pm on May 22, 2008 (gmt 0)

10+ Year Member



Marcia,

Thanks for the excellent link. I am pretty aware of the differences between external redirect and internal rewrite. What I am not familiar with is .htaccess and apache's mod_rewrite.

Unfortunately, I work with IIS and there is no mod_rewrite support. I have something pending where I need to know what the RewriteCond and RewriteRule above do.

If I understood your previous post above, /sports/weblinks.html will get rewritten to /index.php

Is this correct? And if so, the Rewrite rule will ALWAYS write /index.php so long as the conditions match?

Thanks again

Receptional Andy

10:37 pm on May 22, 2008 (gmt 0)



jason1989, I think your post may have happened at about the same time as Marcia's above, so you'll have missed it and likely answers the question. But I'll stick my oar in anyway ;)

what the RewriteRule produces behind the scenes

Nothing at all - the same as requesting index.php directly. All that happens is that index.php is silently returned in response to the request. If index.php outputs the text "hello world" that's what any requested URL (other than those that match existing files and directories) will always display.

As the thread referenced explains, the only difference between a redirect and an internal rewrite is the redirect actually sends the visitor to the URL, instead of doing so behind the scenes.

Where this becomes useful is that index.php can look at the requested URL and display something different based on it, since an internal rewrite adds an extra variable - the requested URL vs the file that is actually displayed to the user.

And I forgot to say: welcome to WebmasterWorld, jason1989 :)

jason1989

10:40 pm on May 22, 2008 (gmt 0)

10+ Year Member



Assuming that the conditions and rule listed above applies, and the user agent's URL is:

/sports/weblinks.html
or
/tastey/ribs.html
or
/foo/bar.html

So long as the conditions match, the internal rewrite will always be:

/index.php

Sorry for going on and on but those conditions are very confusing to me.

[edited by: jason1989 at 10:45 pm (utc) on May 22, 2008]

Marcia

10:44 pm on May 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, ditto the welcome! But...

Unfortunately, I work with IIS and there is no mod_rewrite support

Arggghhhh... but there is ISAPI_Rewrite, which is the equivalent for IIS.

isapi rewrite [google.com]

[edited by: Marcia at 10:50 pm (utc) on May 22, 2008]

jason1989

11:08 pm on May 22, 2008 (gmt 0)

10+ Year Member



Wowzers! Well no wonder I was going bonkers!

I really was expecting something more than just /index.php each time.

I was expecting some parameters to be passed! I thought that the RewriteCond was some kind of magic that passed parameters to RewriteRule.

For example, I have rewritten URLs before with an ISAPI filter I wrote in C++ that uses the Boost Regex library. For example:

// Match: /music-rizzle/rizzle-girl-guy-t12.html#p64
e = "^(.*/).+-t([0-9]+)\.html(#p[0-9]+)?$";

// Format: /viewtopic.php?t=12#p64
fmt = "/viewtopic\.php?t=$2$3";

As you can see, I take /music-rizzle/rizzle-girl-guy-t12.html#p64 and internally rewrite /viewtopic.php?t=12#p64

What Andy said had never occured to me:

"Where this becomes useful is that index.php can look at the requested URL and display something different based on it, since an internal rewrite adds an extra variable - the requested URL vs the file that is actually displayed to the user."

I'm going to have to do some testing now and see if what Andy said is true (just kidding).

Thank you kindly for letting me see what was going on with this.

Jason

g1smd

10:17 pm on May 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, that's the missing step.

After the rewrite, the script at index.php tests REQUEST_URI or other parameters like QUERY_STRING and works out what content to display.