Forum Moderators: coopster
The page has a correct charset (iso 8859-15) in the meta tag of the header.
But this is not important, since the page receives an already decodes entity.
To make this clear, I'll use an example. The string to handle by html_entity_decode():
comment être sûr du paiement?
comment être sûr du paiement?in the source code of the webpage sent to the browser.
And that worked exactly so on our former cPanel shared server.
Recently we movend to a VPS, and in the source code (and on screen), the same string reads:
comment кtre sыr du paiement?.
note: all entities are encoded in this post, but my source code has some weird cyrillic characters?
I guess that this has something to do with the set-up of PHP on the VPS server, so I had a look at the php.ini file, and it has no default charset defined, which is the standard setup.
I was not able to compare the 2 php.ini files, since the previous account was cancelled before I spotted this erroneous behavior.
Could someone set me on the track to solve this problem?
Thanx.
Notawiz
Have you tried placing the charset directly into the html_entity_decode() [us2.php.net] function call?
html_entity_decode($string,ENT_COMPAT,"ISO8859-15"); or
html_entity_decode($string,ENT_COMPAT,"ISO-8859-15"); ?
From your notation in your message, perhaps defining "ISO 8859-15" in the headers is not what your new version of PHP needs to see?
The last sample should not come with entities. In the source code of the page as well as on display, there are 2 cyrillic characters. (means, decoded by PHP before generating the markup of the page).
The string in database is the first one, with ê and û
But when writing my post here, the cgi of webmasterworld encoded them again as entities. I tried to paste the wrongly decoded string exactly as it appears in the markup, but to no avail.
I also asked the question at my hosting company, but they are as puzzled as I am.
It MUST be a silly detail, but which...
Thanx
Notawiz
I experimented with the
ê (French) entity and the к (Cyrillic) entity both from a db and from a string. The results were consistent:
$str1f="comment être"; $str2f=htmlentities("comment être"); echo "1F: ".$str1f."<br />\n"; echo "2F: ".$str2f."<br />\n"; $str1c="comment кtre"; $str2c=htmlentities("comment кtre"); echo "1C: ".$str1c."<br />\n"; echo "2C: ".$str2c."<br />\n"; echo "1Fd: ".html_entities_decode($str1f)."<br />\n"; echo "2Fd: ".html_entities_decode($str2f)."<br />\n"; echo "1Cd: ".html_entities_decode($str1c)."<br />\n"; echo "2Cd: ".html_entities_decode($str2c)."<br />\n"; echo "1Fd: ".html_entities_decode($str1f,"ENT_COMPAT","ISO8859-15")."<br />\n"; echo "2Fd: ".html_entities_decode($str2f,"ENT_COMPAT","ISO8859-15")."<br />\n"; echo "1Cd: ".html_entities_decode($str1c,"ENT_COMPAT","ISO8859-15")."<br />\n"; echo "2Cd: ".html_entities_decode($str2c,"ENT_COMPAT","ISO8859-15")."<br />\n"; All with predictable results: the French entities came out as French characters and the Cyrillic entities came out as Cyrillic characters.
So ... in your case:
1) You have
ê in a string in your database. 2) You have
html_entities_decode($row["frenchstuff"]) in your page that reads from the db. 3) On the old server, the result was French characters.
4) On the new server the result is Cyrillic characters, without any changes to the db or the code.
Is that the issue?
What do you think the odds are that it's the database that has a different default charset?
We found a way to solve it.
I echoed the
get_html_translation_table(HTML_ENTITIES); and got a complete wrong charset here. Not very helpful to others, I fear, since we have a solution, without knowing the exact cause of the error.
Indeed, on the cPanel, there was also no default charset defined, but the entities were always correctly decoded.
Weird.
Thanks for thinking along with me.
Notawiz.