Welcome to WebmasterWorld Guest from 54.158.29.163

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

PHP form processing with non-Latin-1 characters

How to process forms with non-Western character input

     
3:42 pm on Jun 12, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 25, 2002
posts:26
votes: 0


We recently localized some websites into East European
versions (namely Polish, Czech, Russian).

Displaying the web pages in the appropriate character sets
is not the problem, but dealing with the feedback of native
users is.

We are using several forms to generate feedback mails from
our servers. The input should be converted into the most
commonly used e-mail-formats in these languages.

Some questions:

1. Are there any statistics to find out which encodings
are predominant in which language?
(eg. is the common Czech user more likely to use
ISO Latin 2 or Windows CP 1250 for his input?
And furthermore will he be able to display
an e-mail in these character-sets?)

2. Is there a way to analyse which encoding the users use
when filling out the form?
(Or are there even prefab modules to use? :-))

3. If the encoding can be detected correctly, how do we
convert the code into the appropriate encoding for the
generated e-mail?
There is the convert_cyr_string function for Russian, but
how about the other charsets?

4:29 pm on June 12, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2003
posts:775
votes: 0


I've been playing with this myself. What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets. It looks like most browsers post form data back using the character set that the page was encoded in. When you send out the email just specify the content type as UTF-8.
2:48 pm on June 15, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 25, 2002
posts:26
votes: 0


Thanks for the answer, Timotheos.

What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets.

That would be a solution but I was under the impression that a great proportion of Eastern European users still run "older" systems that don't support UTF-8 properly yet.
:(

It looks like most browsers post form data back using the character set that the page was encoded in.

Ah, that one helped me a lot :).
6:38 pm on June 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2003
posts:775
votes: 0


I looked around for a Unicode support chart but couldn't find one. So I'm not sure where the breaking point would be according to browser support. It's always tough deciding how far back you're going to support something and if it's worth that 2-4% of your users. Maybe it's more in your case but I get alot of international traffic and those ancient browsers run at about 4%. I often wonder how those poor souls still using Netscape 3/4 get around.

It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

6:13 pm on June 18, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2003
posts:775
votes: 0


It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

In fact this is true. Just surfed to Google using Netscape 4 and the character set was not in UTF-8.

10:35 pm on June 18, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 9, 2003
posts:28
votes: 0


You might want to look at iconv:

[php.net...]

10:04 am on June 23, 2003 (gmt 0)

New User

10+ Year Member

joined:Oct 25, 2002
posts:26
votes: 0


Thanx for the hint, mischief :-)