Forum Moderators: phranque

Message Too Old, No Replies

Rewriting and encoding

         

plemieux

9:47 pm on Aug 29, 2005 (gmt 0)

10+ Year Member



My Apache Server is set to ISO-8859-1 as its default charset.

I use a basic rewriting rule to map articles names to their database id.

If I click on a url encoded link (ex.: fran%E7ais.htm), I get a 403 error. If I type the url in the address bar without encoding the special character, the rewriting work and the url in the address bar is changed to "fran%C3%A7ais.htm" which is url encoded UTF-8.

So I suppose Apache tries to process the url as UTF-8. Can someone explain me why?

jdMorgan

10:40 pm on Aug 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



plemieux,

Welcome to WebmasterWorld!

> I get a 403 error.

A 403? That is odd. I'd understand a 404, or even a 500, but 403 is a strange response.

The reason Apache is encoding the URL further is that "%" itself is an excluded character [faqs.org].

Jim

plemieux

12:15 pm on Aug 30, 2005 (gmt 0)

10+ Year Member



After doing some research, I have found this article from the W3C:
[w3.org...]
and this one:
[w3.org...]

which states:
We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:
1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).