Forum Moderators: phranque

Message Too Old, No Replies

URL abbreviation RewriteRule clashing

URL abbreviation RewriteRule

         

mbrampton

6:41 am on Apr 7, 2004 (gmt 0)

10+ Year Member



I'm trying to combine the rewriting rule for Mambo's SEF Advance (Search Engine Friendly URLs rather than index.php?...) and a rewrite mapping for URL abbreviation. HTML can be greatly shortened by abbreviation and mapping (a technique pioneered by Yahoo). So the short URL is something like /r/txi/... and it is mapped into /templates/xismyname/images/... because the mapping file contains a line with txi /templates/xismyname/images. In my config below, the mapping file is called abbr_ktc.txt.

SEF Advance works by rewriting any URL that is not for a file or directory (the first RewriteRule below qualified by the first two RewriteCond directives). But these two conditions don't work with the abbreviated URL because (I think) it is not recognised as a file or directory. So the extra RewriteCond is added to match the /r/ of an abbreviated URI. Yet somehow this still does not work.

With the setup shown below, SEF Advance works perfectly, but the abbreviated URLs don't work. If all the RewriteCond lines and the first RewriteRule are commented out, then abbreviated URLs work perfectly, but obviously SEF Advance doesn't. How is the SEF Advance RewriteRule interfering with the abbreviations?

Anyone with ideas?

<VirtualHost 127.0.0.1:80>
ServerName test.ktc
RewriteEngine on
RewriteCond %{REQUEST_FILENAME}!-d
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_URI}!^/r/(.*)
RewriteRule ^/(.*) /index.php
RewriteMap abbr txt:/var/www/html/abbr_ktc.txt
RewriteRule ^/r/([^/]*)/(.*) ${abbr:$1}$2 [redirect=permanent,last]
DocumentRoot /var/www/html/KTC
</Virtualhost>

jdMorgan

2:54 pm on Apr 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mbrampton,

Welcome to WebmasterWorld [webmasterworld.com]!

I have a question about this line:


RewriteCond %{REQUEST_URI} !^/r/(.*)

This line will *prevent* any URL starting with /r/ from being rewritten to /index.php. Is that what you intended? (The "!" at the beginning of the regex pattern means "NOT".)

Jim

mbrampton

5:44 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



Well, I thought it would, but that seems not to happen. The first RewriteRule is all about the search engine friendly URLs and I want it to NOT touch the abbreviated URLs that begin with /r/ and are typically img src links to graphics.

So the abbreviated URLs that begin with /r/ should be left untouched by the first rule and then processed by the rule associated with the mapping, and expanded to their full form.

However, this is not what happens. If the SEF stuff is in place, the links to graphics do not work. If the SEF stuff is commented out, the links to graphics do work. Yet the conditions appear to make them independent of each other. Why does it not work that way?

jdMorgan

8:13 pm on Apr 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I don't know. Let's see what this code does, and then you may be able to tell if it's doing something that you don't want it to do. We have a lot of trouble here because mod_rewrite is very powerful, and small errors of omission or commission can cause big problems. It is also fairly difficult to write an exhaustive description of precisely what you want the code to do in plain english.

RewriteEngine on
# If the requested filename is not a directory or does not exist
RewriteCond %{REQUEST_FILENAME} !-d
# and if the requested filename is not a file or does not exist
RewriteCond %{REQUEST_FILENAME} !-f
# and if the requested URI does not start with "/r/" (you can leave off the last (.*) bit - it makes no difference)
RewriteCond %{REQUEST_URI} !^/r/(.*)
# Then rewrite any non-blank request to /index.php and continue to process the following rules.
RewriteRule ^/(.*) /index.php
#
# Declare a rewrite map named abbr and of type "text" located in /var/www/html/abbr_ktc.txt
RewriteMap abbr txt:/var/www/html/abbr_ktc.txt
# Translate all URLs beginning with "/r" followed by "a first anything", then a slash, followed by "a second anything" to whatever URL-path the map returns for the "first anything" and then the untranslated "second anything", and do a permanent redirect and quit mod_rewrite processing.
RewriteRule ^/r/([^/]*)/(.*) ${abbr:$1}$2 [redirect=permanent,last]

Having worked through that, it may be that your rewrite map does not comprehend the filepath output from the first rule. In that case, you may want to try using the [PT] flag on the first rewrite, so that it outputs a URL and not a filepath. Alternatively, you could try using a canonical URL (http://www.mydoamin.com/index.php) in the substitution of the first rule... Just a guess.

Jim

mbrampton

8:51 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



Thanks, I think your descriptive run through is pretty much spot on. The PT flag doesn't crack it. Another thing that goes wrong is that when the directives quoted above are all in place, the style sheet also gets lost (presumably rewritten). Yet the style sheet IS a file, and therefore ought to fail the conditions for the first rewrite, and its URI does not start with /r/ so it ought not to be affected by the second rewrite.

What still baffles me is why the first rewrite has ANY effect on URIs that begin with /r/. The conditions ought to exclude them, yet plainly they are somehow affected. How can this interaction happen?

Is there a way to find out what the result if the rewrites actually is?

jdMorgan

8:54 pm on Apr 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Is there a way to find out what the result if the rewrites actually is?

Yes, see RewriteLog [httpd.apache.org] and RewriteLogLevel [httpd.apache.org].

Jim

mbrampton

10:41 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



I'm dubious about blaming a pretty standard piece of software, but it seems to me that Apache is behaving oddly. It seems to be invoking rules that are in other VirtualHost sections, and it switched off my log when I commented out some directives in another VirtualHost section. I'm no longer sure what is going on. It is Apache version 2.0.47 which is what Mandrake 9.2 installs by default.

jdMorgan

11:21 pm on Apr 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmm... That's pretty darn weird. You may have something going on with the precedence of various directives within containers or with the heirarchy of those containers used in httpd.conf such as <VirtualHost>, <Directory>, <Files>, etc. Some directives only work in the context of certain containers, as shown in the documentation, and it can take some time to get everything ordered "just right."

I agree about not blaming standard software. And I definitely agree you have a strange problem here. I think this is going to come down to some simple/subtle error or typo, and we'll all say, "Doh!" when you find it.

Sorry I can't be of more help -- Every site is different, and there are too many variables involved.

Keep going on the "comment it out" track; Narrowing down the scope of the problem is a good tactic when you have weird "interference" problems like this.

Jim

mbrampton

7:19 pm on Apr 10, 2004 (gmt 0)

10+ Year Member



Well, finally cracked it, but I'm not sure too much of it warrants a "Doh"! Some of it gets pretty complex, and maybe nobody except me is interested. But I'm posting the results, just in case.

The rules that appeared to be in other VirtualHost sections were there, but what was causing the effect was a .htaccess file in the default document root. I was surprised to find that this affected processing, even in the case of a VirtualHost.

Apache seems not to like the log file being deleted without Apache being restarted (maybe that is reasonable), although it is hard to keep track of the log unless it is regularly cleared. Must remember to restart Apache.

The final VirtualHost section is:

<VirtualHost 127.0.0.1:80>
ServerName test.ktc
DocumentRoot /var/www/html/KTC
DirectoryIndex index.php
RewriteEngineon
RewriteCond /var/www/html/KTC%{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_URI}!^/r/
RewriteRule ^/(.*) /index.php
RewriteMap abbr txt:/var/www/html/abbr_ktc.txt
RewriteRule ^/r/([^/]*)/?(.*)${abbr:$1}$2[redirect=permanent,last]
</Virtualhost>

without logging. Why is it this way? I found through the logging that on the initial call to the base URL for the web site, by the time rewrite processing is started, Apache has added index.html from the main configuration file's DirectoryIndex. It therefore seemed more efficient to put a DirectoryIndex directive into the VirtualHost section to get the right index file.

This then suggested that the RewriteCond testing for a directory was superfluous, since Apache will always add a default file name on the end anyhow. If it exists, well and good, otherwise, the RewriteRule kicks in and replaces the URI with index.php. At no time will the result ever be just a directory. That is OK, as I don't want people getting to directories.

Then we have the CRITICAL CHANGE, only found after scouring the internet for hours. While in .htaccess the test for a file works OK as in the first set of code in this thread, it doesn't work in the main configuration file or VirtualHost. I can't figure the exact reason for this, but it is necessary to precede the {REQUEST_FILENAME} by the path to the document root. Otherwise, the condition fails to recognise legitimate files, and everything goes wrong.

Incidentally, my original belief that the rewriting rules worked separately was mistaken, possibly because of the .htaccess file mentioned above, that I thought couldn't be having any effect (but was).

Not a simple problem to crack - the manual doesn't give enough information, and all the books I've been able to find just restate what is in the manual. Many thanks to JDMorgan for helping me work through this.

mbrampton

7:23 pm on Apr 10, 2004 (gmt 0)

10+ Year Member



PS Just in case anyone tries to use this information, please bear in mind that some of the directives have lost spaces in posting to this forum. Please check carefully to ensure the syntax you use is correct.

mbrampton

7:18 pm on Apr 11, 2004 (gmt 0)

10+ Year Member



PS THe rewrite condition for directory is not superfluous, and needs to go back in:
RewriteCond /var/www/html/KTC%{REQUEST_FILENAME}!-d