Forum Moderators: phranque

Message Too Old, No Replies

Filtering language subdirectory requests

         

PumaProd

5:41 pm on Nov 10, 2009 (gmt 0)

10+ Year Member



I have started to work with mod_rewrite and have succeeded in getting myself ... completely confused!
I have gone through the post from jdMorgan "Mod Rewrite: Stop Writing & Start Reading. Please!" and have tried to follow his recommendations.

Here is what I am trying to achieve - my site will have artificial language directories ie en and fr.
For example :
www.example.com/en/request
or
www.example.com/fr/request

What I am trying to do is to redirect to my language default if anything else is requested - ie:
www.example.com/de/request
should redirect to
www.example.com/en/request

After this has been done, my rewrite rule comes in to action :
RewriteRule ^(en¦fr)/(.*)$ $2?lang=$1

I am having big problems putting in place a condition which will test for two language code first level directory requests which are not en or fr.

The result was either no effect or internal server error - which is why I am not daring to post it here.

Having searched through the forum and read the documents indicated in the charter, I am still no further forward.

Can anyone help here ?

jdMorgan

6:44 pm on Nov 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, the post you cite was by TheMadScientist... Credit where credit is due. :)

Excluding requests for two-letter subdirectories which are not en¦fr should be quite simple. What have yo tried so far?

Jim

PumaProd

9:09 am on Nov 11, 2009 (gmt 0)

10+ Year Member



Hi Jim,

Here is what I am trying :
# validate that any language subdirectory request is valid
RewriteCond %{REQUEST_URI} ^!(en¦fr)
RewriteRule ^([^/]+)/(.+)$ /en/$2 [L]

Which allows all requests with en or fr to go through ok.
However, if I use another language code (ie de), I get a 404 not found response.

jdMorgan

1:22 pm on Nov 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are several problems in the RewriteCond, but otherwise the method is sound. The negation "!" must be done outside the pattern (it is a mod_rewrite operator, not a regex token), and unlike the URL-paths examined by RewriteRule, the URL-paths examined by RewriteCond %{REQUEST_URI} will always start with a slash:

# Internally rewrite subdirectory requests other than /en or /fr to force /en
RewriteCond %{REQUEST_URI} !^/(en¦fr)/
RewriteRule ^[^/]+/(.+)$ /en/$1 [L]

-or alternately-

# Internally rewrite subdirectory requests other than /en or /fr to force /en
RewriteCond $1 !^(en¦fr)$
RewriteRule ^([^/]+)/(.+)$ /en/$2 [L]

(Here we make use of the fact that RewriteRule has already done the work to 'capture' the initial URL-path-part, and we can refer to it as $1.)

You might want to consider, however, that this code now "outlaws" the use of any subdirectories except for /en and /fr. This will likely be a problem if your site is successful and grows, since you'll have to keep all content in either /en, /fr, or root. What about 'shared' resources, such as images, scripts, and CSS stylesheets -- do you really want to have to keep all of this in the root, or to have to duplicate it into the language subdirectories? There are also some "well-known-location" subdirectories such as the "/w3c" subdirectory for privacy-policy file storage that will break with this rule.

So you might consider being more specific in the rule, and rewriting *only* two-letter, all-alphabetic subdirectories which are not /en or /fr, leaving anything else alone:


# Internally rewrite two-letter (language) subdirectory requests other than /en or /fr to force /en
RewriteCond $1 !^(en¦fr)$
RewriteRule ^([a-z]{2})/(.+)$ /en/$2 [L]

Important note: Replace the broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Finally, you should consider whether you really want an internal rewrite, or should externally redirect any "wrong language-code" URL requests to /en instead. If you don't do this, you could end up with many, many duplicates of your English pages in search results, since mis-typed links (e.g. on other sites or in forums, blogs, etc.) to /<anything-but-en-or-fr>/<anything> will otherwise directly return an English page. These duplicates would 'dilute' the ranking of the corresponding English pages. I would recommend using an external redirect instead of an internal rewrite for this reason.


# Externally redirect two-letter (language) subdirectory requests other than /en or /fr to force /en
RewriteCond $1 !^(en¦fr)$
RewriteRule ^([a-z]{2})/(.+)$ http://www.example.com/en/$2 [R=301,L]

Jim

PumaProd

2:17 pm on Nov 11, 2009 (gmt 0)

10+ Year Member



Hi Jim,

Thankyou for this clarification. Good to be able to talk with a real expert!

One further question, however. Taking your last recommendation - as I develop on my local system before uploading to the host, I use a local virtual host entry. This means that the URL used would be different to the public URL.

Would it be equally efficient (is it even possible) to use the host header server variable {HTTP_HOST} in the place of the domain name?

Simon

jdMorgan

2:42 pm on Nov 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, just substitute "%{HTTP_HOST}" for the domain in the RewriteRule substitution, and add a RewriteCond to ensure that the hostname is not blank - as it will be for true HTTP/1.0 requests (if you ever get any).

RewriteCond %{HTTP_HOST} !=""

Also, consider that you could leave the domain as-is in the RewriteRule, and simply define that hostname as 127.0.0.1 in your hosts file, removing that hosts entry when your domain goes live.

Jim