Welcome to WebmasterWorld Guest from 107.22.87.205

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

PHP form processing with non-Latin-1 characters

How to process forms with non-Western character input

     
3:42 pm on Jun 12, 2003 (gmt 0)

10+ Year Member



We recently localized some websites into East European
versions (namely Polish, Czech, Russian).

Displaying the web pages in the appropriate character sets
is not the problem, but dealing with the feedback of native
users is.

We are using several forms to generate feedback mails from
our servers. The input should be converted into the most
commonly used e-mail-formats in these languages.

Some questions:

1. Are there any statistics to find out which encodings
are predominant in which language?
(eg. is the common Czech user more likely to use
ISO Latin 2 or Windows CP 1250 for his input?
And furthermore will he be able to display
an e-mail in these character-sets?)

2. Is there a way to analyse which encoding the users use
when filling out the form?
(Or are there even prefab modules to use? :-))

3. If the encoding can be detected correctly, how do we
convert the code into the appropriate encoding for the
generated e-mail?
There is the convert_cyr_string function for Russian, but
how about the other charsets?

4:29 pm on Jun 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been playing with this myself. What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets. It looks like most browsers post form data back using the character set that the page was encoded in. When you send out the email just specify the content type as UTF-8.
2:48 pm on Jun 15, 2003 (gmt 0)

10+ Year Member



Thanks for the answer, Timotheos.

What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets.

That would be a solution but I was under the impression that a great proportion of Eastern European users still run "older" systems that don't support UTF-8 properly yet.
:(

It looks like most browsers post form data back using the character set that the page was encoded in.

Ah, that one helped me a lot :).
6:38 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I looked around for a Unicode support chart but couldn't find one. So I'm not sure where the breaking point would be according to browser support. It's always tough deciding how far back you're going to support something and if it's worth that 2-4% of your users. Maybe it's more in your case but I get alot of international traffic and those ancient browsers run at about 4%. I often wonder how those poor souls still using Netscape 3/4 get around.

It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

6:13 pm on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

In fact this is true. Just surfed to Google using Netscape 4 and the character set was not in UTF-8.

10:35 pm on Jun 18, 2003 (gmt 0)

10+ Year Member



You might want to look at iconv:

[php.net...]

10:04 am on Jun 23, 2003 (gmt 0)

10+ Year Member



Thanx for the hint, mischief :-)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month