Forum Moderators: phranque

Message Too Old, No Replies

RewriteMap and URL structure

Dynamic to static URL rewriting with exceptions for local files

         

dhatz

2:27 am on Mar 13, 2007 (gmt 0)

10+ Year Member


I want to do the usual dynamic -> static URL re-writing and would like your advice if you see any problems I don't with my implementation. Currently the dynamic pages are of the form /index.php?displaythis=1234 or /index.php?displaythat=5678 etc

I am thinking of a 1-level deep directory structure of the form

/location/widgettype-widgetname.html

[location], [widgettype] and [widgetname] will be changing. But what if I want to also put static files and sub-directories under /location/ , just adding RewriteCond %{REQUEST_FILENAME}!-f doesn't seem to work...

Here is the code I've prepared to do the internal rewriting (seems to work, but hopefully the regexp gurus could rewrite it to be less generic):

RewriteMap map txt:/path/to/map.txt
RewriteCond %{REQUEST_URI} ^/(.*)/widgettype-(.*)\.html$
RewriteCond ${map:%1/widgettype-%2}!^$
RewriteRule ^/(.*)/widgettype-(.*)\.html$ /index.php?displaythis=${map:$1/hotel-$2¦0} [L]

My questions:

1. how to check for existing file on disk, before proceeding to check rewritemap?
2. if not a local file on disk AND not found in map, will it always fall-through to apache (custom) errordocument 404?
3. Any possible loops etc to be aware, that I might unknowingly be creating here?
4. Performance issues to be aware of? (map.txt will have ~2000 lines max)

Thanks in advance!

jdMorgan

2:17 pm on Mar 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> But what if I want to also put static files and sub-directories under /location/ , just adding RewriteCond %{REQUEST_FILENAME}!-f doesn't seem to work...

1. If what is to be checked is a rewritten location, then you'll need to include the new path info in the code that checks for file-exists. Something like:


RewriteCond %{DOCUMENT_ROOT}/location%{REQUEST_URI} !-f

This builds a new filepath similar to that of %{REQUEST_FILENAME}, but explicitly includes the subdirectory path-part "/location".

2. Yes, and you don't really need to explicitly check for the mapped file's existence; If RewriteMap doesn't find a translated address, it will return NULL, and no rewrite will take place -- See the RewriteMap documentation.

3. Loops occur in an .htaccess context when the output URL of a first rule matches the pattern of any other second rule, which then rewrites or redirects back to the URL that matches the pattern of the first rule. And in simple cases these first and second rules may be the same rule. For example, standing alone, the rule

 RewriteRule .* /some_page.php 

will always loop, since "some_page.php" will match the pattern ".*".

This is not usually a problem in httpd.conf or other server config files, unless the two rules mentioned above are separate, and one is an external redirect. The point is that mod_rewrite in .htaccess behaves recursively, but mod_rewrite in httpd.conf does not.

4. 2000 should not be a problem. If you do see performance problems, then switch to using a hashed lookup map instead of a plain-text map.

Regex cleanups: Avoid the use of ".*" whenever possible. It's familiar and seemingly easy to understand, but ambiguous, and potentially very inefficient, especially when more than one ".*" occurs in a pattern. You're often better off using a negative-match pattern (prevalent in the mod_rewrite Rewriting Guide) to look for the next character that you don't want to match. For example,


RewriteRule ^/([^/]+)/([^.]+)\.html$

matches html files one subdirectory deep, placing the subdirectory in $1 and the filename (less extension) in $2, and can be parsed in a single left-to-right pass, because the parser can easily find the "end marker" of the URL-path-part that goes into each subpattern. Whereas, if you used

RewriteRule ^/(.*)/(.*)\.html$

this code would accept any number of subdirectories, a blank filename followed by ".html", and would require many "cut-and-try" passes to try to get a "best-fit" of the requested URI with the pattern -- with the exact number of tries multiplying quickly based on the number of characters in the requested URI.

Jim

dhatz

9:28 pm on Mar 13, 2007 (gmt 0)

10+ Year Member


Jim, thanks a lot for your quick reply and info. Yes, I plan to do every configuration in the httpd.conf, not in .htaccess files.

1. Wrt checking rewritten location, did you probably mean (without the location):

RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}!-f
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}!-d

I had already read your rewrite FAQ elsewhere in the forum, where you suggested that ideally one would use a /product/... type URL, so that all URLs-to-be-rewritten neatly fall in one place, so rules can be unambiguous and more reliable.

I wonder what other good methods I can use to further reduce potential problems, e.g. what about excluding system paths from rewriting?

RewriteCond %{THE_REQUEST}!^POST
RewriteCond %{REQUEST_URI}!^/(adodb¦cache¦config¦css¦images).*
RewriteCond %{REQUEST_URI}!^/(favicon\.ico¦robots\.txt¦index\.php).*

Or maybe for URLs like /location/widgettype-widgetname.html I should just list all the locations explicitly by name, e.g.

RewriteCond %{REQUEST_URI} ^/(location1¦location2¦location3¦...¦location100).*

but they are quite a few (~100) - (max apache conf line size?).

From my perspective, these solutions seem equal, but maybe someone more experienced can spot a potential issue (e.g. a differences / incompatibility in mod_rewrite between Apache 2.x and 1.3.x that would favor one vs another)

jdMorgan

11:34 pm on Mar 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The RewriteCond I posted was exactly as I intended it, given the discussion that preceded it. Everything depends on whether you wish to check the requested URL, or the rewritten URL, or both, to see whether they exist, and only you know that... :)

File-exists and directory-exists checks are "expensive" CPU-time-wise, and if you don't intend to rewrite directory URLs, then I suggest that you skips the directory-exists check to avoid *three* checks of the server filesystem per request. In addition to the exists-checks that you do with RewriteCond, remember that the server will do another before it serves the file that the rewritten URL finally resolves to, so the fewer filesystem checks you actually do, the better.

As I suggested in the thread you cited, it's better to "mark" the URLs you wish to redirect or rewrite with a uniquely-identifiable URL-tag, so that mod_rewrite does not have to be encumbered with many if-and-or-else conditional clauses. If you preface all of your product URLs with "prod" and all of your category URLs with "cat" then you can use two very-simple rules to redirect those to the products and categories scripts, respectively. Simple is good.

As to problems due to differences between Apache versions, I see none, and we can deal with them if they happen; Most problems have nothing to do with Apache differences, but rather with planning, design, or implementation errors in the server-side scripting or the Web site itself.

Jim