Forum Moderators: phranque
I'm trying to setup a directory rule in my httpd.conf to test out removing filenames from the documents in that directory.
So right now I have:
www.mysite.com/test/test.shtml
I'd like to change it to www.mysite.com/test/test
And redirect any requests to test.shtml to the new test.
How would I go about doing that? I think I can get the 301's working, but I'm not sure how to get it to remove the extensions, and still render the pages properly.
Thanks for any help!
Then you use one or more mod_rewrite RewriteRules to 'find' the proper files associated with those extensionless URLs, when those URLs are requested from your server by a client (e.g. a click on an extensionless link).
Finally, if a URL is requested *with* an extension, you 301-redirect to the extensionless URL. It is important to do this only if the original client request was sent with an extension -- You must use a RewriteCond to check %{THE_REQUEST} to do this, so as to prevent an 'infinite' rewrite/redirect loop due to interaction with the rule in the second step above.
While the first two steps are required, this last step is not. It is usually done to speed up re-indexing of the site with its new extensionless URL-set, to preserve traffic from old, un-updated inbound links and the PageRank/Link-popularity from those links, and to preserve the function of your visitors' old bookmarks. On a new site with extensionless URLs, the only reasons to do this last step would be to make sure that your internal site workings remain 'hidden' and to prevent certain exploits.
Remember, files need extensions (for example, to tell the server how to handle them properly as well as which MIME-type header to send to tell the client how to handle them), but URLs don't -- That's why we speak of "file extensions" even when discussing URLs. URLs and filenames are not at all the same thing, and need not resemble each other, because mod_rewrite and/or scripts can be used to modify the default URL-to-filename mapping of the server.
The most basic function of an HTTP server is to translate from the URL system used on the Web to the proprietary (and often arbitrary) file-naming system used by the server, its operating system, and its webmaster(s). The purpose of a URL is to provide a resource location method that is independent of servers' operating systems and filesystems.
There are many threads here on extensionless URL-handling; Try a site search (link at top left of this page) for "extensionless URL RewriteCond THE_REQUEST" for fairly-well targeted results.
Jim
I managed to fix the problem by changing my code to match html extensions in %{THE_REQUEST} rather than in %{REQUEST_URI}. Before I had done this, I was creating an infinite rewrite/redirect as Jim describes.
My question is, why did this make a difference when I changed from %{REQUEST_URI} to %{THE_REQUEST}? I've posted the external redirect code here. Happy to post the full remaining code I was using if this necessary.
Old code to catch .html requests that created infinite loop:
RewriteCond %{REQUEST_URI} .*\.html$
RewriteRule ^(.*)\.html$ http://www.mysite.co.uk/private/development/$1 [R=301,L]
New code that works:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP
RewriteRule ^(.*)\.html$ http://www.mysite.co.uk/private/development/$1 [R=301,L]
Thanks for your help!
zedjay
I presume this means that the REQUEST_URI is altered step by step in the Rewrite rules we give it (ie, the whole idea behind being able to incrementally adapt a URL with rewrites), however the THE_REQUEST will remain static. I'm really new to using .htaccess and Apache so just getting my head around it.
One last thing that still has me a little confused: It seems intuitive to me that after an external redirect all the rules will be run again, as the .htaccess file is essentially saying 'not at this location, try somewhere else', so a new request is sent. However, it surprised me that after an internal rewrite marked [L] that the URL was being passed back to the top rule and iterated through the rules again. After a while, I realised what was going on since I was receiving a loop, but I'd interpreted the Apache mod_rewrite docs as saying that [L] forced the rules to discontinue and return the contents to the browser. Any chance you could give a layman's explanation of why this happens?
Thanks!
The RewriteRule [L] flag terminates rule processing for the current iteration only. If any rewrite has been invoked in the current iteration, rule-processing begins again from the top.
Consider that some rewriterules are used to enforce access control (see RewriteRule [F] flag). It is also possible that an errant rewrite might result in a match with a rule that triggers a 410-Gone response (see [G] flag). For these and other reasons, the rewritten URL must be checked again, in case it matches an access-control or non-existent-file rule.
Therefore, mod_rewrite in an .htaccess context requires explicit loop prevention -- either in the case where a substitution URL matches the rule's own pattern, or in the case where two complimentary rules contain substitutions which match each other's patterns (which was the case with your rewrite-then-redirect looping).
Jim