Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite - old pages to subdomains

My rules seem to work, but are they OK....

         

Robber

3:40 pm on Mar 4, 2004 (gmt 0)

10+ Year Member



Just started messing about with this and it seems pretty useful. My first stab seems to work OK but I thought I'd see if you guys have any tips/advice on anything that is perhaps not too efficient.

I have:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www\.abc\.co.uk
RewriteRule ^(.*)$ h*tp://www.abc.co.uk$1 [R=301]

RewriteCond %{REQUEST_URI} /online-quotes/(.*)
RewriteRule ^(.*)$ h*tp://quotes.abc.co.uk/%1 [R=301,L]

So the first part is to direct everyone to the www site even if they dont put the www in.

The second rule is to redirect anyone requesting:
h*tp://www.abc.co.uk/online-quotes.asdf.php
to
h*tp://quotes.abc.co.uk/asdf.php

Something I'm not quite getting though, in the RewriteRule the pattern is ^(.*)$, which I think says match zero or more of anything from the start to the end of the requested URL, so basically the whole URL.

The fact it is () sets a baclreference, now this is where I'm losing it a little. If the backreference contains the whole URL, why is Substitution:
h*tp://www.abc.co.uk$1

To me this suggest the rewritten URL would end up being:
h*tp://www.abc.co.ukh*tp://www.abc.co.uk/blahblahblah.

I can see that the backreference only actually contains the request_uri, but how does it get fom matching the whole url to only the uri?

Thanks for any help.

jdMorgan

10:15 pm on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Robber,

Because that's the way it works... :)
Otherwise, it might be quite limited in scope.

Read the documentation starting in the Substitution section of RewriteRule in the Apache mod_rewrite documentation [httpd.apache.org], and all will be revealed - See Note in about the sixth paragraph.

JIm

Robber

11:01 pm on Mar 4, 2004 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for that, I've been looking at the documentation quite a bit today, but as it says in the intro "don't expect to understand this entire module in just one day."!

That note you mentioned states "In consequence, if negated patterns are used, you cannot use $N in the substitution string!"

This is troubling me a little at the moment since I am using a negated pattern but also using a backreference, which seems to go against the note - I'm sure I am misinterpreting something here?

Are we saying that because we can't have "grouped wildcard parts in the pattern" due it being negated, the ^(.*)$ matches (and sets a backreference) to everything that is in the original URL but not in RewriteCond condpattern, or put another way, matches everything from the original url which is not HTTP_HOST, which leaves only the REQUEST_URI?

Hmmm, this is going to take a little practise but thanks for your help!

jdMorgan

11:36 pm on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I cited the wrong location. The note I was referring to reads:

Note: Never forget that Pattern is applied to a complete URL in per-server configuration files. But in per-directory configuration files, the per-directory prefix (which always is the same for a specific directory!) is automatically removed for the pattern matching and automatically added after the substitution has been done. This feature is essential for many sorts of rewriting, because without this prefix stripping you have to match the parent directory which is not always possible.

That is the reason you don't need to/have to match the entire URL in a RewriteRule pattern.

The note you are quoting says that you cannot do this:


RewriteRule !^(.*)\.php /$1.php [L]

to redirect all non-php files to their equivalent php files. It won't work because it's a negated pattern, and therefore the back-reference will be empty if the rule matches.

Similarly, you can't do this:


RewriteCond %{HTTP_HOST} !^www\.(.*)\.org
RewriteRule (.*) http://www.%1.org/$1 [R=301,L]

Because here the %1 back-reference is to a negated pattern in the RewriteCond, which will be empty if the RewriteCond matches.

Jim

Robber

12:05 am on Mar 5, 2004 (gmt 0)

10+ Year Member



Ah, that makes sense!

One more thing though for clarity, by per directory is this referring to rewrite rules placed in .htaccess, or is this referring to rewrite rules set in httpd.conf but for a subdomain as defined in a virtualhost container?

Hopefully you will say the latter and I will have this nailed, but I have a nasty feeling you will say its the first on - my understanding was that .htaccess was used for per directory definitions! I am defining my rules in the .conf

Cheers

jdMorgan

1:55 am on Mar 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Per-directory would be in .htaccess or in a <directory> container in httpd.conf. Per-server means that it applies to multiple <directory> containers in httpd.conf.

There's a difference between the httpd.conf and .htaccess per-directory contexts as well:

In httpd.conf, you'd write

 RewriteRule ^[b]/[/b]index.html$ /blah.blah [L] 

while in .htaccess, you'd write
 RewriteRule ^index.html$ /blah.blah [L] 

This can be changed using RewriteBase, but most coders don't bother. So, if you work from any examples shown here, be sure to account for this difference.

In either case, the pattern used in RewriteCond %{REQUEST_URI} includes the leading slash.

Jim