Forum Moderators: coopster
I'm having problems with text encoding.
An accented é will appear as an é at the page and in the HTML source. Once the string has been stored, the script outputs it as "é". The script outputs the same character from the database as "ˆ©".
Looking at the database text file it appears as "?©". I have tried saving the text file using all the available text encodings, they all produce different results but not what I'm looking for.
The script uses shell commands so it could be affecting the encoding, when I use the same commands from my shell I get the accented é. In the script once the text has been stored into variables via shell commands, using str_replace or preg_replace I'm unable to modify the "é" or "ˆ©" parts of a string.
I'm looking for a way to get a match between the string in the database and the one from the external page.
I tried: I placed é in <input type="text" name="a"> and then in php wrote
<?PHP
$a = $_POST["a"];
print("a: $a)";
?>
and it did write é.
So I don't know what is it that you have.
Sorry.
if it is mysql take a read through this
MySQL Character Set Support [dev.mysql.com]
Using a simple form to returns the same accented é character.
I do not have any encoding set on this page, I've tried using different encodings with different encodings of the text files and none have worked.
Using the same commands and database through my shell, the characters appear properly, but that's the bash shell of OS X so I think it's applying Mac OS Roman encoding.
This enabled me to save them as latin encodings. I tried ISO Latin 1, ISO Latin 9, Windows Latin 1, Mac OS Roman.
Even with the characters as é instead of?©, they still wouldn't appear correctly, the only one that did was the one with Mac OS Roman encodings. I use BBEdit to do the encodings btw. It was the same using "iso-8859-1" encoding in the page as with no encoding defined. This could just be how the browser (Safari) is handling the encoding, I do not know much about encoding.
How it displays isn't that important really, what is is how the script differentiates between the "é" and the "ˆ©" (the character appears in the text file as "?©"). It's also important that I can use PHP commands like str_replace and preg_replace to modify these characters, PHP will not replace the characters.
There was a thread here once RE encodings entitled something like "There's no such thing as 'plain text'"... it sounds like your situation is a case study in that!
I'm surprised that the Mac OS Roman encoding is the one that works. I thought the Mac version of ISO-8859-1 was very similar to Windows-1252 and caused problems with lots of punctuation (curly quotes for example).
Any ideas?
Tom