Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite redirection

         

Fireball22

11:24 am on May 9, 2010 (gmt 0)

10+ Year Member



Hello all,

I want to rewrite the URLs to redirect the visitors of my website directly to a URL witch contains the corresponding language, for example:
www.mydomain.com/en/

Also domain.com should be redirected to www.domain.com.

There are some further statements as you can see as follows:


RewriteCond %{HTTP_HOST} !^www\.[^.:]+\.(co\.[a-z]{2}|[a-z]{2,6})\.?(:[0-9]+)?$
RewriteRule ^(de|en)$ http://www.%{HTTP_HOST}/$1/ [R=301,L]

#RewriteCond %{HTTP_HOST} ^www\.[^.:]+\.(co\.[a-z]{2}|[a-z]{2,6})\.?(:[0-9]+)?$
#RewriteRule ^(de|en)$ http://%{HTTP_HOST}/$1/ [R=301,L]

RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(de|en)/(.*)$ $2?language=$1 [QSA,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(de|en)/(.*)$ $2 [L]

RewriteCond %{HTTP_HOST} !^www\.[^.:]+\.(co\.[a-z]{2}|[a-z]{2,6})\.?(:[0-9]+)?$
RewriteRule (.*) http://www.%{HTTP_HOST}/$1 [R=301,L]


The problems that still exists are the following:
- If you type in www.example.com/en you will be directed to www.www.example.com/en/
(The uncommented statement above shows a solution, which I tested, but is it possible to merge it with other satements?)
- And there is also a bug if you type in e. g. example.com/en/. Then you won't redirected to www.example.com/en/, but rather to www.example.com/index.php?language=de instead www.example.com/en/index.php?language=de

I'm really looking for your tips!

Thank you very much in advance!

Best regards

Michael

[edited by: jdMorgan at 3:37 pm (utc) on May 9, 2010]
[edit reason] De-linked. Please use example.com [/edit]

jdMorgan

3:40 pm on May 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your first rule says to add "www" to any requested hostname that does not start with "www" and end with ".co.<country-code>". But you are testing with a hostname ending in ".com", so this rule adds the extra "www". The problem is the incorrect RewriteCond pattern.

Your other problems are caused by incorrect rule order. In general, put all external redirects first, in order from most-specific to least-specific, followed by all internal rewrites, again ordered from most- to least-specific.

Jim

Fireball22

4:07 pm on May 9, 2010 (gmt 0)

10+ Year Member



Thank you very much for your help!

Now all bugs are solved, until this one:
If I try to request www.domain.com/en without a "/" at the end of ".../en", then the site is not forwarded to www.domain.com/en/.
Only if I request domain.com/en/ without www, it works!

But I can't find any bug...


RewriteCond %{HTTP_HOST} !^www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?$
RewriteRule ^(de|en)$ [%{HTTP_HOST}...] [R=301,L]

RewriteCond %{HTTP_HOST} !^www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?$
RewriteRule (.*) [%{HTTP_HOST}...] [R=301,L]

RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(de|en)/(.*)$ $2?language=$1 [QSA,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(de|en)/(.*)$ $2 [L]


Best regards
Michael

jdMorgan

5:38 pm on May 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Probably the unnecessary RewriteCond on the first rule.

Do not redirect if the HTTP_HOST is blank (see change to 2md rule's RewriteCond pattern).

Also, several 'security' problems in the last two rules need attention.

RewriteRule ^(de|en)$ http://www.%{HTTP_HOST}/$1/ [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
#
RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]
#
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(de|en)/(.+)$ /$2 [L]

Further, unless it is absolutely necessary, I suggest that you do not use the rewriteconds on both of the last two rules to go check the disk for file- and directory- exists -- unless you like having a slow server...
Instead of using exists checks, it is often possible to exclude certain URLs from being rewritten, either by URL-path or by "filetype". For example, you can exclude commonly-included objects with a RewriteCond like

RewriteCond $2 !\.(gif|jpe?g|png|ico|css|js)$

But this simple example may need to be modified or expanded depending on the details of your sites URL-structure.

Jim

Fireball22

4:24 pm on May 10, 2010 (gmt 0)

10+ Year Member



What do you mean exactly with security-problems in the last two rules?

I changed the code as follows:


RewriteRule ^(de|en)$ [%{HTTP_HOST}...] [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(.*)$ [%{HTTP_HOST}...] [R=301,L]
#
RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]
#
#RewriteCond %{REQUEST_FILENAME} !-d
#Only rewrite PHP-Scripts
RewriteCond $2 !\.php$
RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]
#
#RewriteCond %{REQUEST_FILENAME} !-f
#Only rewrite PHP-Scripts
RewriteCond $2 !\.php$
RewriteRule ^(de|en)/(.+)$ /$2 [L]


I replaced the file- and directory-exist-checks with a condition, which only fits to PHP-Files. But what is with GET-Parameters, are they also included in this condition?

And unfortunately www.domain.com/de is being rewritten to www.www.domain.com/de/ again.

Best regards
Michael

jdMorgan

5:38 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> What do you mean exactly with security-problems in the last two rules?

There's a security problem addressed by the changes I made, but I don't care to say more in a public forum.

---

Query strings are not 'visible' to the RewriteRule pattern-matching or to RewriteConds which back-reference the RewriteRule's pattern-match ($1 - $9). If you wish to check query string data, use a RewriteCond testing %{QUERY_STRING}

If the desired function of your rule does not depend on the query string, then you can ignore it -- Any query string appended to the requested URL-path will pass through your rules unchanged unless you explicitly replace the query data or add query data to the existing query string (using the [QSA] flag) in the RewriteRule's substitution.

To expand on that a bit, query strings are not treated as part of the URL-path because they do not indicate the *location* of a resource. Instead, query strings contain data to be passed to the resource (e.g. a script) *at* that URL. That's why I used the phrase "query string appended to the URL-path."

---

In the absence of any specific rewriterules pertaining to such URLs, Apache mod_dir will rewrite any directory path which is missing a trailing slash to add that trailing slash.

Jim

Fireball22

2:25 pm on May 12, 2010 (gmt 0)

10+ Year Member



Ok, I changed several things as you can see in the following part.
Please inform me, if there are still bugs in it. ;)


#1)Rewrite www.domain.com/<language> to www.domain.com/<language>/ -> No, it's not for this case, what does this rule exactly do?
RewriteCond %{HTTP_HOST} ^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(de|en)$ http://%{HTTP_HOST}/$1/ [R=301,L]
#
#2)Following rule is incorrect, double www. --> www.www.domain.com
#RewriteCond %{HTTP_HOST} !^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
#RewriteRule ^(de|en)$ http://www.%{HTTP_HOST}/$1/ [R=301,L]
#
#3)Add www. subdomain to requests without www.
#IMPORTANT: What about GET-paramters? QSA-flag? OK: NO QSA-flag, because we don't change any GET-paramters
RewriteCond %{HTTP_HOST} !^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
#
#4)If request contains automatically <language> caption, then rewrite to save the language
#IMPORTANT: Maybe also other filenames next to index.php are possible --> The following rule exactly does this
RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]
#
#5)
#RewriteCond %{REQUEST_FILENAME} !-d
#RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]
#
#6)
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteRule ^(de|en)/(.+)$ /$2 [L]
#
#7)
#If requested file is a php-file, then forward to the corresponding <language>/<file>
#IMPORTANT: check behavoir when file not exists, error404 page or not?
RewriteCond $2 ^\.php$
RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]


Now it's possible to reach my site if you request www.domain.com/en/ oder www.domain.com/en without coming to www.www.domain.com/en/ or something like this.

But I don't know if it possible to merge some rules togehter, maybe the first and thrid?

I also tried to bring the PHP-file-check productive, but everytime I request a PHP-site I get an error404, what's wrong here? (Rule 7)

Thank you very much for your help in advance!

Best regards
Michael

[edited by: jdMorgan at 3:02 pm (utc) on May 12, 2010]
[edit reason] Disabled auto-linking for readability [/edit]

jdMorgan

3:17 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



#1 This rule adds a trailing slash if either example.com/en or example.com/de is requested.

#2 This rule is entirely redundant with rule #3. Delete rule #2.

#5 & #6. Consider excluding images and other resources which are NOT language-dependent. This may improve the performance of your server dramatically, since it will prevent a lot of unnecessary disk checking:

#5
RewriteCond $2 !\.(gif|jpe?g|png|ico|css|js)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]
#
#6
RewriteCond $2 !\.(gif|jpe?g|png|ico|css|js)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(de|en)/(.+)$ /$2 [L]

(Filetypes shown here are examples only. Adjust the pattern to suit your site.)

#7 The RewriteCond pattern is incorrect and will only match a request for *exactly* example.com/.php
It should be

RewriteCond $2 \.php$

with no start-anchor on the pattern.

Jim

Fireball22

1:12 pm on May 13, 2010 (gmt 0)

10+ Year Member



Thank you very much for your professional help!

Instead of excluding file-types I used the possibility to include file types which are language based.
And the only files which contains a language are the PHP-files.

So I would use finally the following rules:


#Adds trailing slash: www.domain.com/<language> --> www.domain.com/<language>/
RewriteCond %{HTTP_HOST} ^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(de|en)$ [%{HTTP_HOST}...] [R=301,L]

#Adds the www-subdomain
RewriteCond %{HTTP_HOST} !^(www\.[^.:]+\.([a-z]{2,6})\.?(:[0-9]+)?)?$
RewriteRule ^(.*)$ [%{HTTP_HOST}...] [R=301,L]

#If request contains already <language> caption, then rewrite to indicate the actual language
RewriteRule ^(de|en)/(index\.php)?$ index.php?language=$1 [QSA,L]

#5)
#RewriteCond %{REQUEST_FILENAME} !-d
#RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]

#6)
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteRule ^(de|en)/(.+)$ /$2 [L]

#7)
#If requested file is a php-file, then forward to the corresponding <language>/<file>
#IMPORTANT: check behavoir when file not exists, error404 page or not?
#RewriteCond $2 ^\.php$ #FALSE statement
RewriteCond $2 \.php$
RewriteRule ^(de|en)/(.+)$ /$2?language=$1 [QSA,L]


But one thing I can't understand:
- Why contains rule #5 a GET-parameter with language=$1 if the requested file is a folder? A folder can't save a language?!
- And rule #6 doesn't contain a GET-parameter so save a language

It's not further bad, because I replaced #5 and #6 through #7, but I'm trying to understand why I made this.

And what's your opinion, is rule #7 specific enough to replace the older two rules and are there every bugs left?

Best regards
Michael