Forum Moderators: phranque

Message Too Old, No Replies

Another extensionless mod rewrite problem

         

joeinnantucket

2:35 am on Sep 30, 2010 (gmt 0)

10+ Year Member



Good evening everyone,
I've searched around various places looking for an extensionless mod_rewrite solution. My site is in php and I found one solution offered by jdMorgan but didn't work

RewriteEngine On

# Externally redirect direct client requests for URLs ending in .php to extensionless URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+\.php([?#][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite extensionless URLs to add ".php" if a corresponding php file exists
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(([^/]+/)*[^.]+)$ /$1.php [L]

I already changed the links on my pages to point to the extensionless urls however, I am receiving 404 errors.

If anyone can help, I would greatly appreciate it!

g1smd

6:59 am on Sep 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




,

jdMorgan

3:00 pm on Sep 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please define "did not work" a bit more clearly.
  • What URL did you type as a test?
  • What were the expected results?
  • What were the actual results?
  • How did the expected and actual results differ?

    Did you place this code in a .htaccess file, or in a server config file?

    Also, do you have any other working rewriterules in this file? Have you tested a simple rule like
     RewriteRule ^foo.html$ http://www.google.com/ [R=301,L]$ 

    in that same file?

    Jim
  • sublime1

    4:52 pm on Sep 30, 2010 (gmt 0)

    10+ Year Member



    Do I undertand correctly that this is what you want:

    When have a request like:

    http://example.com/foo.php


    you want it to
    1) Return a 301 redirect to the browser with the URL
    http://example.com/foo
    , then
    2) When the requests for any path whose name can resolve to a file having a .php extension, serve that file.

    Assuming that the host name is consistent, I think it can be simpler, something like this:

    # Redirect any request with a .php extension to have no extension
    RewriteRule ^(.*)\.php $1 [R=301,L]
    # Extensionless requests that are .php files should be served directly
    RewriteCond %{REQUEST_FILENAME}.php -f
    RewriteRule ^(.*)$ $1.php [L]


    Note: you may need slightly different syntax if your doing this from a server context (e.g. in a Virtual Host file) versus if you're doing this from an .htaccess file in the root of your server -- in particular, I think in the latter case you may need to add a forward slash. See the note in the "Per Directory Rewrites" section of [httpd.apache.org...]

    It looks like you're doing work in the regexes that Apache already does for you with query string parameters and # anchors, as well as paths names.

    So for example, I think this rewrite would correctly serve a request to

    http://example.com/path1/path2/foo?param1=bar1&param2=bar2

    would get served if, for example, there were a file on your web server with a (unix) path like

    /var/www/example.com/path1/path2/foo.php


    I agree with Jim -- these things are best worked out with a blazingly simple example.

    And one other thing if you have access to them: RewriteLog and RewriteLogLevel can be tremendously helpful. Unfortunately, they are only settable in the server or virtual host contexts, not .htaccess files. But in one case, I was able to coerce a web host to add a rewrite log that I could read to our virtual host config while I worked out a gnarly problem.

    Good luck!

    Tom

    joeinnantucket

    5:15 am on Oct 1, 2010 (gmt 0)

    10+ Year Member



    Thanks for your help, I should have been more clear.

    What I would like to do is to have URLs like www.example.com/page instead of www.example.com/page.php

    To answer jdMorgan's questions, using the code I displayed the first time:

    1. I typed in www.example.com/sailing
    2. I wanted to see my sailing.php page but with the www.example.com/sailing url
    3. i got a 404 error
    4. I received a Page Not Found error instead of seeing the page

    The code was in a .htaccess file and there are no other rewrite rules

    jdMorgan

    1:02 pm on Oct 1, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    While usable in a server config file inside a <Directory> container, the "simplified" code above will not work in .htaccess, because it will cause an infinite loop due to the recursive behavior of mod_rewrite in a per-directory context.

    Your original code contains logic to prevent this looping, and the regex patterns -- although more complex, result in unambiguous, single-left-to-right-pass parsing of the HTTP request lines and requested URL-paths.

    Further, the protocol and hostname are explicitly stated in the redirect RewriteRule's substitution field in order to preclude problems when the ServerName is non-canonical and UseCanonicalName is set to "On." This setting is not under your control in shared hosting, so this is good insurance against a change or an error made by your host.

    For the time being, comment out your first rule (the redirect) until you get the second rule (the internal rewrite) working. Divide and conquer.

    I suggest adding the following two lines ahead of your first RewriteRule:

    AcceptPathInfo off
    Options +FollowSymLinks -Indexes -MultiViews
    #

    Note that the first line is only valid on Apache 2.x. The second line may be required, or it may not be allowed -- The only way to find out is to test it.

    Jim

    sublime1

    3:22 pm on Oct 2, 2010 (gmt 0)

    10+ Year Member



    Jim --

    <blush> You're quite right, of course, my example was wrong (in several ways).

    After a lot of testing locally with .htaccess (I usually use server/vhost config) I realize now I had completely failed to understand the key bit about recursion, which appears to be that, in effect, the whole rewrite process begins again after an internal rewrite -- presumably using the rewritten URL as a new starting point (because it's recursive).

    Thus, the main issue (in this case) would be: how can one differentiate between the original request and the internally rewritten one. My example fails to do that.

    So, if I am correct, in joeinnantucket's original example, the first RewriteCond prior to the external redirect is able to distinguish the second (recursive) request from the original by recognizing that the second request in
    %{THE_REQUEST}
    no longer has the .php extension whereas the internally rewritten URI, the filename, now does.

    Right? Wrong? Close?

    Thanks!

    g1smd

    5:46 pm on Oct 2, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Testing THE_REQUEST in a RewriteCond tests what the browser asked for.

    Use that to pick which requests should be redirected.

    jdMorgan

    5:55 pm on Oct 2, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Actually, it works the other way. %{THE_REQUEST} is always the HTTP request line sent by the client, and is never changed by any internal processing.

    On the other hand, the URL-path examined by RewriteRule gets updated on-the-fly by mod_rewrite and several other modules (e.g. mod_dir).

    So the combination of that rule+condition requires that in order to redirect, the .php request had to come from an HTTP client, and not as a result of a previous internal rewrite; It is the RewriteCond that fails if the .php was added as a result of a previous iteration of the xyz-to-xyz.php internal RewriteRule.

    Case1: Client requests "/xyz.php". RewriteRule matches and RewriteCond matches. Redirect to "/xyz" to remove ".php"

    Case2: Client requests xyz. RewriteRule fails to match, RewriteCond is not processed. Pass control to subsequent internal /xyz-to-xyz.php rewrite.

    Case3: Mod_rewrite restarts after Case2 /xyz-to-xyz.php rewrite. RewriteRule matches, but RewriteCond fails, because client requested "/xyz", not "/xyz.php". No redirect is invoked in this case.

    Note that in place of %{THE_REQUEST}, one could test %{REDIRECT_STATUS} to prevent rewrite-redirect at a gross level. The problem is that this method cannot unambiguously determine that %{REDIRECT_STATUS} was set by the Case2 RewriteRule; It may have been set by any number of other directives. Therefore, I prefer to explicitly check %{THE_REQUEST}.

    In order to make sense of this, some readers may need to be reminded that an external URL-to-URL redirect causes the server to send a response to the HTTP client (e.g. browser or search robot), telling it to re-request what it asked for initially, but using a new URL. This terminates the current HTTP transaction. Having sent a redirect response, the server never enters the content-handling phase of the Apache API for this request. It forgets all about this completed HTTP transaction, and the client must (at its own discretion) start a new one, using the URL provided in the server's previous redirect response.

    On the other hand, internal URL-to-filepath rewrites take place entirely with the context of a single HTTP transaction. Once the internal rewriting is finished, the server exits the URL-to-filepath translation phase of the API, and enters the content-handler phase.

    The above is a somewhat simplified description; The mod_rewrite recursion observed in the .htaccess context is a result of the fact that .htaccess files are actually processed in the "Fixup" phase of the Apache API, and the only way to know that all mod_rewrite "fixups" are completed is to keep processing the .htaccess code until no more RewriteRules get invoked.

    Therefore, the [L] flag must be seen to mean "Stop all rule processing for this pass through the rules in .htaccess. It is still well-worth using, though, to prevent all rules from being processed every time mod_rewrite processes the .htaccess file. With the [L] flag on every rule, all rules will only be processed on the final pass through the .htaccess file. Then, when none are invoked, the server will "know mod_rewrite is finished" and enter the content-handling phase.

    Jim

    joeinnantucket

    2:27 am on Oct 3, 2010 (gmt 0)

    10+ Year Member



    I want to thank everyone for their help, so far I haven't been able to get it to work and I'm wondering if it's more trouble than it's worth. A simple way is to just make a rewrite rule such as:

    RewriteRule ^([^/]*)/$ /page.php?name=$1 [L]

    I was able to get that to work, however the site I wanted to use it for is quite small, only 7 pages and the content won't change very often so there will be no CMS involved and I was hoping for an easy way to remove the file extensions from the URLs.

    I apologize for wasting anyone's time and I appreciate all the help. This is a great community and again, I thank everyone for their help.