Forum Moderators: coopster

Message Too Old, No Replies

error on html_entity_decode() function

charset problem

         

Notawiz

5:17 pm on May 29, 2005 (gmt 0)

10+ Year Member



In one of our scripts, we html_entity_decode() a string with entities, before sending it to the browser.

The page has a correct charset (iso 8859-15) in the meta tag of the header.
But this is not important, since the page receives an already decodes entity.

To make this clear, I'll use an example. The string to handle by html_entity_decode():


comment être sûr du paiement?

should read:
comment être sûr du paiement?
in the source code of the webpage sent to the browser.

And that worked exactly so on our former cPanel shared server.

Recently we movend to a VPS, and in the source code (and on screen), the same string reads:

comment кtre sыr du paiement?
.

note: all entities are encoded in this post, but my source code has some weird cyrillic characters?

I guess that this has something to do with the set-up of PHP on the VPS server, so I had a look at the php.ini file, and it has no default charset defined, which is the standard setup.

I was not able to compare the 2 php.ini files, since the previous account was cancelled before I spotted this erroneous behavior.

Could someone set me on the track to solve this problem?
Thanx.

Notawiz

StupidScript

5:41 pm on May 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmm... Perhaps this is a charset issue.

Have you tried placing the charset directly into the html_entity_decode() [us2.php.net] function call?

html_entity_decode($string,ENT_COMPAT,"ISO8859-15");

or

html_entity_decode($string,ENT_COMPAT,"ISO-8859-15");

?

From your notation in your message, perhaps defining "ISO 8859-15" in the headers is not what your new version of PHP needs to see?

StupidScript

8:28 pm on Jun 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nope. Not a charset issue.

Where did

comment кtre sыr du paiement?
come from? In other words, how is that coming to be the string you are attempting to decode?

Notawiz

5:53 pm on Jun 2, 2005 (gmt 0)

10+ Year Member



Hello S...script,

The last sample should not come with entities. In the source code of the page as well as on display, there are 2 cyrillic characters. (means, decoded by PHP before generating the markup of the page).
The string in database is the first one, with ê and û

But when writing my post here, the cgi of webmasterworld encoded them again as entities. I tried to paste the wrongly decoded string exactly as it appears in the markup, but to no avail.

I also asked the question at my hosting company, but they are as puzzled as I am.

It MUST be a silly detail, but which...

Thanx
Notawiz

StupidScript

7:22 pm on Jun 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Notawiz ...

I experimented with the

ê
(French) entity and the
к
(Cyrillic) entity both from a db and from a string.

The results were consistent:

$str1f="comment être";

$str2f=htmlentities("comment être");

echo "1F: ".$str1f."<br />\n";

echo "2F: ".$str2f."<br />\n";

$str1c="comment &#1082;tre";

$str2c=htmlentities("comment &#1082;tre");

echo "1C: ".$str1c."<br />\n";

echo "2C: ".$str2c."<br />\n";

echo "1Fd: ".html_entities_decode($str1f)."<br />\n";

echo "2Fd: ".html_entities_decode($str2f)."<br />\n";

echo "1Cd: ".html_entities_decode($str1c)."<br />\n";

echo "2Cd: ".html_entities_decode($str2c)."<br />\n";

echo "1Fd: ".html_entities_decode($str1f,"ENT_COMPAT","ISO8859-15")."<br />\n";

echo "2Fd: ".html_entities_decode($str2f,"ENT_COMPAT","ISO8859-15")."<br />\n";

echo "1Cd: ".html_entities_decode($str1c,"ENT_COMPAT","ISO8859-15")."<br />\n";

echo "2Cd: ".html_entities_decode($str2c,"ENT_COMPAT","ISO8859-15")."<br />\n";

All with predictable results: the French entities came out as French characters and the Cyrillic entities came out as Cyrillic characters.

So ... in your case:

1) You have

&ecirc;
in a string in your database.

2) You have

html_entities_decode($row["frenchstuff"])
in your page that reads from the db.

3) On the old server, the result was French characters.

4) On the new server the result is Cyrillic characters, without any changes to the db or the code.

Is that the issue?

What do you think the odds are that it's the database that has a different default charset?

Notawiz

10:58 pm on Jun 2, 2005 (gmt 0)

10+ Year Member



Hello S...Script,

We found a way to solve it.
I echoed the

get_html_translation_table(HTML_ENTITIES);
and got a complete wrong charset here.
Since there was no default_charset defined in php.ini, I'm not sure where the erroneous entity translation table comes from.
But filling the iso-8859-15 charset in the default_charset config restored the translation_table to a normal rendering.

Not very helpful to others, I fear, since we have a solution, without knowing the exact cause of the error.
Indeed, on the cPanel, there was also no default charset defined, but the entities were always correctly decoded.
Weird.

Thanks for thinking along with me.
Notawiz.

StupidScript

11:19 pm on Jun 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good work. That was gnarly. Your solution will probably help someone in the future.

coopster

1:00 pm on Jun 3, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Also check the PHP Version that is running. A few bugs were reported before 4.3.