Forum Moderators: phranque

Message Too Old, No Replies

New international characters challenge related to using mod_rewrite

International characters are being transformed into UFT-8 code

         

zyron

9:23 pm on Feb 27, 2005 (gmt 0)

10+ Year Member



This new challenge is related to a related thread posted earlier: [webmasterworld.com...]

After doing some changes to the settings, the end result became quite nice:

Before, the use of international characters would look like this:
domain/b%C3%A5t

But now, they are being preserved as they are:
domain/båt

[båt is Norwegian and means boat]

International characters are being transformed into UFT-8 code that I decode in the script,
and if I only write domain/båt, then that is being done in the background and what has been written
is being preserved.

The problem surface when I want to change the appearance of the URL.

I have added this code to transform " " into "_":
RewriteRule ^(.+)\ (.+) $1\_$2 [N,R]

What is working perfectly for everything that does not contain international characters, like for instance: "domain/blue coat" is being transformed into "domain/blue_coat"

Writing is without this rule would give this result: "domain/blue%20coat"

So far, so good, but the rewrite rule starts behaving in an undesired way when I'm using international characters:
"domain/båt blue" is being transformed into "domain/b%C3%A5t_blue"

Without using the rule the result would be: "domain/båt%20blue"

I don't achieve much by adding the NE code:
RewriteRule ^(.+)\ (.+) $1\_$2 [N,R,NE]

Then the result is transformed into: "domain/båt_blue"

So, I'm wondering if anyone has any suggestion of how to solve this problem without having to modify the source code of the rewrite module?

To make matters worse, the behavior is different in Internet Explorer compared to Firefox!

It seems like Firefox is encoding the string before sending it, so that the results look like this:
"domain//b%E5t"

Which results in the string being sent to the script as written, but then I get problems again because I use a UFT-8 decoder, so that my string looks like this: "b?"

That means I have to change the source code for the Firefox browser as well.. ;)

zyron

7:59 pm on Mar 2, 2005 (gmt 0)

10+ Year Member



I solved the problem myself, with a workaround!

Set a:
RewriteCond %{REQUEST_URI}!Ã
RewriteRule ^(.+)\ (.+) $1\_$2 [N,R]
for to prevent it to make _ when it has
int. chars.

Give %20 instead as space, so I make a javascript
that transform those into _ and send it back
to the server.

So that I can write
domain/nå må du høre dæsse oña pö

and the browser display
domain/nå_må_du_høre_dæsse_oña_pö (!)

works only with IE

Firefox does a urlencode first..

I installed detection for browser
so I resolved my firefox probs as well!

Nice..

Zyron