Forum Moderators: phranque
I am thinking of a 1-level deep directory structure of the form
/location/widgettype-widgetname.html
[location], [widgettype] and [widgetname] will be changing. But what if I want to also put static files and sub-directories under /location/ , just adding RewriteCond %{REQUEST_FILENAME}!-f doesn't seem to work...
Here is the code I've prepared to do the internal rewriting (seems to work, but hopefully the regexp gurus could rewrite it to be less generic):
RewriteMap map txt:/path/to/map.txt
RewriteCond %{REQUEST_URI} ^/(.*)/widgettype-(.*)\.html$
RewriteCond ${map:%1/widgettype-%2}!^$
RewriteRule ^/(.*)/widgettype-(.*)\.html$ /index.php?displaythis=${map:$1/hotel-$2¦0} [L]
My questions:
1. how to check for existing file on disk, before proceeding to check rewritemap?
2. if not a local file on disk AND not found in map, will it always fall-through to apache (custom) errordocument 404?
3. Any possible loops etc to be aware, that I might unknowingly be creating here?
4. Performance issues to be aware of? (map.txt will have ~2000 lines max)
Thanks in advance!
1. If what is to be checked is a rewritten location, then you'll need to include the new path info in the code that checks for file-exists. Something like:
RewriteCond %{DOCUMENT_ROOT}/location%{REQUEST_URI} !-f
2. Yes, and you don't really need to explicitly check for the mapped file's existence; If RewriteMap doesn't find a translated address, it will return NULL, and no rewrite will take place -- See the RewriteMap documentation.
3. Loops occur in an .htaccess context when the output URL of a first rule matches the pattern of any other second rule, which then rewrites or redirects back to the URL that matches the pattern of the first rule. And in simple cases these first and second rules may be the same rule. For example, standing alone, the rule
RewriteRule .* /some_page.php This is not usually a problem in httpd.conf or other server config files, unless the two rules mentioned above are separate, and one is an external redirect. The point is that mod_rewrite in .htaccess behaves recursively, but mod_rewrite in httpd.conf does not.
4. 2000 should not be a problem. If you do see performance problems, then switch to using a hashed lookup map instead of a plain-text map.
Regex cleanups: Avoid the use of ".*" whenever possible. It's familiar and seemingly easy to understand, but ambiguous, and potentially very inefficient, especially when more than one ".*" occurs in a pattern. You're often better off using a negative-match pattern (prevalent in the mod_rewrite Rewriting Guide) to look for the next character that you don't want to match. For example,
RewriteRule ^/([^/]+)/([^.]+)\.html$
RewriteRule ^/(.*)/(.*)\.html$
Jim
1. Wrt checking rewritten location, did you probably mean (without the location):
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}!-f
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}!-d
I had already read your rewrite FAQ elsewhere in the forum, where you suggested that ideally one would use a /product/... type URL, so that all URLs-to-be-rewritten neatly fall in one place, so rules can be unambiguous and more reliable.
I wonder what other good methods I can use to further reduce potential problems, e.g. what about excluding system paths from rewriting?
RewriteCond %{THE_REQUEST}!^POST
RewriteCond %{REQUEST_URI}!^/(adodb¦cache¦config¦css¦images).*
RewriteCond %{REQUEST_URI}!^/(favicon\.ico¦robots\.txt¦index\.php).*
Or maybe for URLs like /location/widgettype-widgetname.html I should just list all the locations explicitly by name, e.g.
RewriteCond %{REQUEST_URI} ^/(location1¦location2¦location3¦...¦location100).*
but they are quite a few (~100) - (max apache conf line size?).
From my perspective, these solutions seem equal, but maybe someone more experienced can spot a potential issue (e.g. a differences / incompatibility in mod_rewrite between Apache 2.x and 1.3.x that would favor one vs another)
File-exists and directory-exists checks are "expensive" CPU-time-wise, and if you don't intend to rewrite directory URLs, then I suggest that you skips the directory-exists check to avoid *three* checks of the server filesystem per request. In addition to the exists-checks that you do with RewriteCond, remember that the server will do another before it serves the file that the rewritten URL finally resolves to, so the fewer filesystem checks you actually do, the better.
As I suggested in the thread you cited, it's better to "mark" the URLs you wish to redirect or rewrite with a uniquely-identifiable URL-tag, so that mod_rewrite does not have to be encumbered with many if-and-or-else conditional clauses. If you preface all of your product URLs with "prod" and all of your category URLs with "cat" then you can use two very-simple rules to redirect those to the products and categories scripts, respectively. Simple is good.
As to problems due to differences between Apache versions, I see none, and we can deal with them if they happen; Most problems have nothing to do with Apache differences, but rather with planning, design, or implementation errors in the server-side scripting or the Web site itself.
Jim