Forum Moderators: coopster
The website is in 9 languages (EN, FR, DE, ES, IT, RU, JP, CN, & Korean). Page encoding is unicode...
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
...on all but JP, CN & KR.
Clearly it is an encoding problem, but I have looked around and cannot really see where to start to fix the problem. My PHP form submits to an external PHP file, which in turn checks configuration information on an external text file.
If I cannot get this to work with the European languages then there is no hope for the Asian forms.
$inputB=htmlspecialchars($input), which would result in München, and could be brought back to normal with $input=htmlspecialchars_decode($inputB).
My Red Hat 7.3 Linux / PHP 4.3.2 box didn't do what your system did. Here's my sampe code:
<form action="<?=$_SERVER["SCRIPT_NAME"]?>" method=post> <? if ($_POST["field1"]) { echo "<input type=text name='field1' value='".$_POST["field1"]."' /> ".$_POST["field1"]."<br />\n"; $to="myemail@example.com"; $subject="Testing oddballs"; $message=$_POST["field1"]; $headers="From: myadmin@example.com\r\n"; mail($to,$subject,$message,$headers); } else { echo "<input type=text name='field1' />\n"; } ?> </form> When
München is input into the field, the output remains München both in the field and in HTML text next to it when the form is submitted and in the email I received on my Win98 box using the Calypso mail client. What kind of server and/or version of PHP are you running?
PHP will only handle certain charset-encodings [php.net] internally:
Encodings of the following types are safely used with PHP:
- A singlebyte encoding:
- which has ASCII-compatible (ISO646 compatible) mappings for the characters in range of 00h to 7fh.
- A multibyte encoding:
- which has ASCII-compatible mappings for the characters in range of 00h to 7fh.
- which don't use ISO2022 escape sequences.
- which don't use a value from 00h to 7fh in any of the compounded bytes that represents a single character.
I have read of problems with PHP using utf-8 encoding internally (sorry, cannot now give any reference) and therefore have placed a note in my mind to make sure that I maintain iso-8859-1 encoding. This will clearly be determined by both the server-encoding and the page-script encoding (another mind-notated reference was of someone reporting that his page-script-encoding determined the php-internal-encoding).
You are now dealing with quite a chain of transference:
I have looked around and cannot really see where to start to fix the problemThe above should fix that!
[webmasterworld.com...]
Look down the whole thread first as much of the meat is towards the bottom and coopster assembled a bunch of relevant links at the very end.
One problem that may be very hard to surmount is if your users carelessly paste in text isn't displaying the characters as they expect, but they don't notice and send it as is.
I have a fairly lengthy post on this problem.
Wow! *That*'s a good post! Thank goodness somebody with direct-experience of this issue answered!
I am also going to take a little space to promote my Class on RFC-Compliant Request/Response Headers [webmasterworld.com] (roundly ignored by everybody!) since it offers the chance to programmatically-discover the charsets available/sent. One item that I wanted to add to the Class was to be able to auto-convert charsets, but felt I had insufficient experience at this point to tackle it. Hence my interest in this topic, and thanks once again for your posting.