Forum Moderators: phranque
Please help me understand something that is driving me mad. But first I believe that I have a pretty good understand what a RewriteCondition like this does:
RewriteRule ^(.*).htm$ $1.php
Let me explain what I believe it does. Because of the ^ symbol, it begins at the very beginning and stores everything up to but not including .htm
The first $ sign indicates that nothing follows .htm and it ends there.
The second $ sign indicates that we take the backreferenced data (all characters matched within (.*) and the second $ sign indicates that the backreferenced data is put there with .php as the new extension.
So for instance, if we have:
index.htm it will be rewritten to index.php
This is what I just cannot figure out. I am totally dumbfounded by the following set of conditions/rules do.
#############################
RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php
#############################
Could somebody please take the time to carefully explain this to me?
Thank you
Let me explain what I believe it does. Because of the ^ symbol, it begins at the very beginning and stores everything up to but not including .htmThe first $ sign indicates that nothing follows .htm and it ends there.
If the $ in the second half of the expression is being used to match what's in the (parenthesis) that precedes, in the first half of the expression, for purposes of matching (like a variable), why do you think that the .htm will be ignored?
AFAIK, it won't be: ((something).htm) won't be ignored, but if it's ((something).html) it will be ignored because it isn't a match - or the converse, which is exactly something I did just yesterday.
[edited by: Marcia at 9:18 pm (utc) on May 22, 2008]
If the first $ is the end of the string being matched in the expression, why do you think that .htm isn't included? Why? Where does it say that?
I'm not the best at mod_rewrite or regex, so corrections are welcome!
The htm is matched, but as it isn't within brackets it isn't passed along to $1. So, the .htm is essentially dropped from the rewriterule.
RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]
This matches all of the named file extensions, and extensionless URLs in the requested URL - the bar character essentially means OR.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
These two conditions are to avoid conflicts - the first checks to see if the request matches a file that actually exists, the second a directory.
RewriteRule (.*) index.php
Everything matching these conditions is internally rewritten to index.php, which presumably then analyses the requested URL to determine what content to display. It's a way of creating 'virtual' filenames and folders via a single PHP file.
As above - any corrections gratefully accepted.
[edited by: Receptional_Andy at 9:18 pm (utc) on May 22, 2008]
RewriteEngine on
RewriteRule ^/directory/(.*).html$ http://www.example.com/directory/script.php?file=$1 [QSA]
If you type /directory/foo.html into the browser address bar, you will get whatever is equivalent to foo.html in the script. The $1 will backreference whatever (.*) is, with the indicated file extension appended. However, if you type /directory/foo.htm into the browser address bar, you will get a "404 Page not Found", since the script is generating (matching) .html pages, not .htm pages - so foo.htm is not a match. So how could it be, if the expression is ignoring the file extension?
The first part of the expression - what's to be matched - ends with the first $ at the end of the first "half" of the expression. If not, where does it say otherwise, that something before it would be ignored, if it wasn't excluded by a ! preceding?
What that's saying is that if your browser requests anything.html then the system will deliver to your browser whatever is represented by script.php?file=anything with that .html extension appended, because that is exactly what the first part of that expression told the system to match, using the (.*) as a wild card back reference for the $1 at the end of the second half of the expression. The first half is declaring a variable - (.*) - and a constant - .html so when the two are put together, they give the user agent the right file.
[Reminder to self: Keep repeating to self: URIs are not the same thing as files.]
[edited by: Marcia at 9:56 pm (utc) on May 22, 2008]
The regex
^/directory/(.*).html$ matches directory/foo.html but not directory/foo. However only the part within brackets (.*) is captured and stored in $1. So in the case of
/directory/foo the rule is never triggered since there is no match. When there is a match, only the (.*) part (now stored in $1) is passed to the script. My regex/mod_rewrite terminology is not great, granted ;)
back-reference
That's the word I was looking for! :)
Take the following:
#############################
RewriteCond %{REQUEST_URI} (/¦\.atom¦\.rss¦\.htm¦\.php¦\.pdf¦\.html¦/[^.]*)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php
#############################
To me, the RewriteRule looks as if it will simply append index.php to the REQUEST_URI.
So, for instance, if we analyze this URL:
[127.0.0.1...]
My (wild) guess is that the result will be:
/sports/weblinks.htmlindex.php
Can someone tell me what the result would be?
If the file /sports/weblinks.html exists, then it will displayed as normal. Otherwise, the file /index.php will displayed. Within index.php there could be code that checks what was requested, and displays different content - hence /sports/weblinks.html could show something different to the user than /sports/news.html - even though index.php is the file that is called in both cases.
[webmasterworld.com...]
With one, the file extension visibly changes in the browser (external redirect), but with the other (internal rewrite), it doesn't - it just delivers the file that's matched to the requested URL in the regex and the file requested stays the same, including the file extension. That's probably the hardest concept to labor through.
If the file /sports/weblinks.html exists, then it will displayed as normal. Otherwise, the file /index.php will displayed. Within index.php there could be code that checks what was requested, and displays different content - hence /sports/weblinks.html could show something different to the user than /sports/news.html - even though index.php is the file that is called in both cases.
[edited by: Marcia at 10:24 pm (utc) on May 22, 2008]
Thank you for responding. I am aware that the rewrite takes place behind the scenes and is not a redirect.
What I want to know is what the RewriteRule produces behind the scenes. My guess is:
/sports/weblinks.htmlindex.php
but it is just my guess. Can you tell me what is being rewritten?
Thanks again
What I want to know is what the RewriteRule produces behind the scenes.
Lets say that the RewriteRule says to deliver xyz.anything to the browser when abc.html is typed in. Then, the xyz(.anything) file will be displayed to the browser when abc.html is requested, and abc.html will be what's shown as the URL.
[edited by: Marcia at 10:38 pm (utc) on May 22, 2008]
Thanks for the excellent link. I am pretty aware of the differences between external redirect and internal rewrite. What I am not familiar with is .htaccess and apache's mod_rewrite.
Unfortunately, I work with IIS and there is no mod_rewrite support. I have something pending where I need to know what the RewriteCond and RewriteRule above do.
If I understood your previous post above, /sports/weblinks.html will get rewritten to /index.php
Is this correct? And if so, the Rewrite rule will ALWAYS write /index.php so long as the conditions match?
Thanks again
what the RewriteRule produces behind the scenes
Nothing at all - the same as requesting index.php directly. All that happens is that index.php is silently returned in response to the request. If index.php outputs the text "hello world" that's what any requested URL (other than those that match existing files and directories) will always display.
As the thread referenced explains, the only difference between a redirect and an internal rewrite is the redirect actually sends the visitor to the URL, instead of doing so behind the scenes.
Where this becomes useful is that index.php can look at the requested URL and display something different based on it, since an internal rewrite adds an extra variable - the requested URL vs the file that is actually displayed to the user.
And I forgot to say: welcome to WebmasterWorld, jason1989 :)
/sports/weblinks.html
or
/tastey/ribs.html
or
/foo/bar.html
So long as the conditions match, the internal rewrite will always be:
/index.php
Sorry for going on and on but those conditions are very confusing to me.
[edited by: jason1989 at 10:45 pm (utc) on May 22, 2008]
Unfortunately, I work with IIS and there is no mod_rewrite support
isapi rewrite [google.com]
[edited by: Marcia at 10:50 pm (utc) on May 22, 2008]
I really was expecting something more than just /index.php each time.
I was expecting some parameters to be passed! I thought that the RewriteCond was some kind of magic that passed parameters to RewriteRule.
For example, I have rewritten URLs before with an ISAPI filter I wrote in C++ that uses the Boost Regex library. For example:
// Match: /music-rizzle/rizzle-girl-guy-t12.html#p64
e = "^(.*/).+-t([0-9]+)\.html(#p[0-9]+)?$";
// Format: /viewtopic.php?t=12#p64
fmt = "/viewtopic\.php?t=$2$3";
As you can see, I take /music-rizzle/rizzle-girl-guy-t12.html#p64 and internally rewrite /viewtopic.php?t=12#p64
What Andy said had never occured to me:
"Where this becomes useful is that index.php can look at the requested URL and display something different based on it, since an internal rewrite adds an extra variable - the requested URL vs the file that is actually displayed to the user."
I'm going to have to do some testing now and see if what Andy said is true (just kidding).
Thank you kindly for letting me see what was going on with this.
Jason