Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite rule not working with multiviews enabled

extensionless urls, extension, content negotiation

         

elbowlobstercowstand

5:53 pm on Mar 8, 2008 (gmt 0)

10+ Year Member



Don't let this long-looking post frighten you... just being thorough in my initial request. K. I'm using mod_rewrite to make pretty urls point to a php file that sucks in a city and displays it. Everything works as expected until content negotiation is enabled. First, the working setup:

.htaccess:


RewriteEngine on
RewriteBase /
RewriteRule ^/?test\.php/([a-zA-Z_]+)$ test?city=$1 [L]

test.php:


<?php
echo "<p>City = {$_GET['city']}</p>";
echo $_SERVER['REQUEST_URI'];
?>

uri input = uri output:


http://www.example.com/test.php/phoenix

browser output:


City = phoenix
/test.php/phoenix

The above works as expected. Now add content negotiation (line 2 of .htccess) and change the script relative to extensionless urls, and things stop working:

.htaccess:


RewriteEngine on
Options +multiviews
RewriteBase /
RewriteRule ^/?test/([a-zA-Z_]+)$ test?city=$1 [L]

test.php:


<?php
echo "<p>City = {$_GET['city']}</p>";
echo $_SERVER['REQUEST_URI'];
?>

uri input = uri output:


http://www.example.com/test/phoenix

browser output:


City =
/test/phoenix

I'm not sure why this is. The thing that baffles me is that I'm returning the uri_request in both cases, and they are just what I expect: one with an extension and the other without. Really this file is just a test to move on to bigger and better things. To get "my" http -> https .htaccess file working without extensions (it currently doesn't). See this post for jdMorgan's excellent advice and thorough details:

[webmasterworld.com ]

And as a side note, jdMorgan mentions a different approach to extensionless urls which is here:

[webmasterworld.com ]

jdMorgan, what be thine take on content negotiation/multiviews? Is there a reason you don't suggest it in the above post? (Perhaps you don't because of the problems I'm having now!)

Looking forward to your advice as you are perhaps the most well known mod_rewrite ninja on this site!

jdMorgan

6:30 pm on Mar 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well the basic problem appears to be that mod_negotiation is getting control before mod_rewrite on your server, and it's diverting the requests away before mod_rewrite can affect them.

Questions:

Why do you need MultiViews?

If you really need them, have you looked at the content-negotiation configuration to be sure you've excluded URLs that must/should be handled by mod_rewrite? (This would have to be done using a <Directory> container in httpd.conf or conf.d to limit Options MultiViews to certain directories only.)

Have you considered the feasibility of replacing MultiViews' function by more-selective mod_rewrites, or by handling such functions in your script(s)? (The "file exists" checking available using mod_rewrite's RewriteCond directive can be particularly handy for this.)

I personally don't like using MultiViews because it can create duplicate-content problems (the same content accessible via multiple URLs), thus posing a potentially-big problem for search optimization/ranking.

Jim

jdMorgan

6:39 pm on Mar 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I should also add that content-negotiation is fundamentally flawed: It presumes that the user has configured his/her browser for language and file-format preferences. While this sounded good in the early days of the Web, it no longer applies now that non-technical people are using the Web, and are now in the majority. And they're often using it from internet cafes in ethnically-diverse regions of the world, and therefore, there is no longer any dependable relationship between the language preferences of the user and the language-preference setting of the browser.

The same is true for geo-location; Consider again these ethnically-diverse regions: Picking a language based on geo-location is also a flawed strategy, even in Europe! -- Maybe the Swiss user wants French, or Dutch, or German...

The only way to reliably determine user preferences is to explicitly ask the user --within the current session-- to select a language and (possibly) other preferences; That's why many multi-language sites have all those little national flags all over the front page... :)

Jim

elbowlobstercowstand

9:21 pm on Mar 8, 2008 (gmt 0)

10+ Year Member



Honestly... my thought on MultiViews was that uri's look cleaner. That's why I chose to use them. Now I'm a bit embarrassed that I didn't research it... but your timing is impeccable, as I haven't launched yet.

Hearing about the possible search engine ding... well that is HUGE, and has me running back to .php extensions. I'll be converting my site now (totally serious).

Now I have EXTREME interest in the mod_rewrite and file_exists option you spoke of. I would still like to have clean urls. And it sounds like I can. Is this what the rewrite would look like?

RewriteRule ^/?([a-z]+)$ http://www.example.com/$1.php [L]

Perhaps you have covered a mod_rewrite/file_exits method in another post. I'd love to know your methodology as I want to be using best practices.

Thanks so much jdMorgan!

PS: Wait, I just realized I don't have to convert my site's current linking structure at all (which have no .php extensions). The rule automatically has them reference the real file. Freaking cool!

elbowlobstercowstand

10:43 pm on Mar 8, 2008 (gmt 0)

10+ Year Member



Ok, I got this rule to work to make extensions "invisible". when I request index, I in fact see the index.php page, AND the extension is not displayed in the title bar:


RewriteRule ^/?([a-z]+)$ http://www.example.com/$1.php [L]

However, I'm trying to marry that rule with the http to https rules that "we" wrote before ([webmasterworld.com ]), and thinks aren't working (I modified the possible pages to not have extensions, (.*), and [R=301,L]):


RewriteCond %{REQUEST_URI} !\.(css¦jpeg¦js¦gif¦png)$ [NC]
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{REQUEST_URI} ^/?(secure¦owner)$
RewriteRule ^/?([a-z]+)$ https://www.example.com/$1.php [L]

RewriteCond %{REQUEST_URI} !\.(css¦jpeg¦js¦gif¦png)$ [NC]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{REQUEST_URI} !^/?(secure¦owner)$
RewriteRule ^/?([a-z]+)$ http://www.example.com/$1.php [L]

When I request http://www.example.com/secure, the browser loads the https version, but also tacks on the .php extension! (https://www.example.com/secure.php) This makes NO sense to me, because this is the EXACT standalone rule I was using to make invisible extensions.

Here is my understanding: For the RewriteRule, it is capturing the REQUEST_URI (not from the previous step, just from the actual request), which in my case is "secure" (with no leading slash and NO extension)... it should then shove that baby into the end of where it is actually pointing. Thus, pointing to http://www.example.com/secure.php [while at the same time not showing it]. But it does show the extension.

There must be an interplay between the condition and the rewrite rule. Is the conditional just b4 the rule actually giving a variable to the rule?! I thought that's what parentheses did. I mean, I thought you have to use the var on the same line, and that for conditions, you ref by % and for rules you ref by $.

I do realize there is a screwyness that jdMorgan clued me on b4... that leading forward slashes are needed in conditions, but not in rules (I think this depends on your version of apache, but regardless the two act opposite). If you can't tell, I'm thoroughly confused. But confusion comes before the storm. Wait... :o) ... b4 the storm blows over. I'm not sure what I'm saying. I can't believe how HARD mod_rewrite is! I wish you could throw in echos or something!

g1smd

12:40 am on Mar 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mod_Rewrite sees the leading / if the rule is in your httpd.conf file but does not see it if the rule is in your .htaccess file.

One extra thing to think about, is that having / and /index is another Duplicate Content issue. One of those should issue a 301 redirect or a 404 error, and the other should issue the content along with "200 OK".

jdMorgan

1:22 am on Mar 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You are mixing external redirects and internal rewrites. Use the previously-developed SSL/non-SSL external redirect rules, and then place your extensionless-to-php internal rewrite rule after them.

Jim

jdMorgan

1:51 am on Mar 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That probably wasn't clear. Here's what I meant. I also added five rules at the end to check for various existing filetypes when resolving extensionless URLs -- Feel free to re-order them to change their priority, or delete them if not needed. You may in fact be perfectly happy with only the first three rules if all extensionless URLs are to be unconditionally resolved to .php files.

# Externally redirect HTTP requests for "/secure" and "/owner" to HTTPS (SSL),
# except for css, jpg, jpeg, js, gif, and png files
RewriteCond %{REQUEST_URI} !\.(css¦jpe?g¦js¦gif¦png)$ [NC]
RewriteCond %{REQUEST_URI} ^/(secure¦owner)
RewriteCond %{SERVER_PORT} !^443$
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
#
# Externally redirect HTTPS requests for all except "/secure" and "/owner" to HTTP,
# except for css, jpg, jpeg, js, gif, and png files
RewriteCond %{REQUEST_URI} !\.(css¦jpe?g¦js¦gif¦png)$ [NC]
RewriteCond %{REQUEST_URI} !^/(secure¦owner)
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Simple rule: Internally rewrite all extensionless URLs to .php
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.php [L]
#
#
# More-complex-rules: Check for existing files with .php, .html, .htm, .shtml, and .shtm extensions
#
# If extensionless URL resolves to existing .php file, rewrite to .php
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.php [L]
#
# Else if extensionless URL resolves to existing .html file, rewrite to .html
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]
#
# Else if extensionless URL resolves to existing .htm file, rewrite to .htm
RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
#
# Else if extensionless URL resolves to existing .shtml file, rewrite to .shtml
RewriteCond %{REQUEST_FILENAME}.shtml -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.shtml [L]
#
# Else if extensionless URL resolves to existing .shtm file, rewrite to .shtm
RewriteCond %{REQUEST_FILENAME}.shtm -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.shtm [L]

Note also the slightly-more-complicated regular expressions, intended to support extensionless URLs at any subdirectory level, and to allow any characters in the final URL-path-part except for a period or a slash -- the period indicating that there's a file extension, and the slash meaning that it's a directory, not a file.

BTW, it's really not a bad idea to leave comments in your code -- at least until you're finished working on a site.

Jim

elbowlobstercowstand

7:16 am on Mar 10, 2008 (gmt 0)

10+ Year Member



Humbled again! Those first three rules are EXACTLY what I needed, and they work like a charm!

Admittedly I haven't gone through your more complex "extensionless" rewrite... but I'm soon going to do as you suggested to another member: print it out, and highlight only the things I understand, never moving through unless I understood the beginning. And I generally use comments... I'll start posting them.

Couple question:
1) Was there a reason you moved the port rule down one? Performance?
2) As a rule, redirects BEFORE rewrites. Correct?

Also, regarding what g1smd wrote: great suggestion! I did a quick search and tried to implement [webmasterworld.com ]

...but it wasn't the quick fix I thought it may be (though it seems to be a replica of my desired outcome). When I type in http://www.example.com/index.php OR /index, it just sticks, and doesn't redirect to the root.

Perhaps I put the new rules in the wrong place (before) my http -> https rules. I'm struggling a little knowing the "stacking" order to place things in.

PS: Also looking for a way to convert example.com to www.example.com

I don't expect you guys to continue pumping out solutions for me, but I certainly won't hold you back! I'm definitely learning from your posts, and am super-appreciative of your effort.