PHP form processing with non-Latin-1 characters - PHP Server Side Scripting forum at WebmasterWorld - WebmasterWorld

Forum Moderators: coopster

Message Too Old, No Replies

PHP form processing with non-Latin-1 characters

How to process forms with non-Western character input

antizwerg

3:42 pm on Jun 12, 2003 (gmt 0)

10+ Year Member

We recently localized some websites into East European
versions (namely Polish, Czech, Russian).

Displaying the web pages in the appropriate character sets
is not the problem, but dealing with the feedback of native
users is.

We are using several forms to generate feedback mails from
our servers. The input should be converted into the most
commonly used e-mail-formats in these languages.

Some questions:

1. Are there any statistics to find out which encodings
are predominant in which language?
(eg. is the common Czech user more likely to use
ISO Latin 2 or Windows CP 1250 for his input?
And furthermore will he be able to display
an e-mail in these character-sets?)

2. Is there a way to analyse which encoding the users use
when filling out the form?
(Or are there even prefab modules to use? :-))

3. If the encoding can be detected correctly, how do we
convert the code into the appropriate encoding for the
generated e-mail?
There is the convert_cyr_string function for Russian, but
how about the other charsets?

Timotheos

4:29 pm on Jun 12, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I've been playing with this myself. What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets. It looks like most browsers post form data back using the character set that the page was encoded in. When you send out the email just specify the content type as UTF-8.

antizwerg

2:48 pm on Jun 15, 2003 (gmt 0)

10+ Year Member

Thanks for the answer, Timotheos.

What I'm doing is using Unicode (UTF-8) so I don't have to worry about all the different character sets.

That would be a solution but I was under the impression that a great proportion of Eastern European users still run "older" systems that don't support UTF-8 properly yet.
:(

It looks like most browsers post form data back using the character set that the page was encoded in.

Ah, that one helped me a lot :).

Timotheos

6:38 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I looked around for a Unicode support chart but couldn't find one. So I'm not sure where the breaking point would be according to browser support. It's always tough deciding how far back you're going to support something and if it's worth that 2-4% of your users. Maybe it's more in your case but I get alot of international traffic and those ancient browsers run at about 4%. I often wonder how those poor souls still using Netscape 3/4 get around.

It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

Timotheos

6:13 pm on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

It's interesting to see how other sites are doing it. A good example is Google. Everything there is UTF-8 as far as I can tell. Maybe if I was using an older browser it would be different.

In fact this is true. Just surfed to Google using Netscape 4 and the character set was not in UTF-8.

mischief

10:35 pm on Jun 18, 2003 (gmt 0)

10+ Year Member

You might want to look at iconv:

antizwerg

10:04 am on Jun 23, 2003 (gmt 0)

10+ Year Member

Thanx for the hint, mischief :-)