homepage Welcome to WebmasterWorld Guest from 54.161.247.22
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Character encoding.
fm86




msg:4299820
 2:22 pm on Apr 18, 2011 (gmt 0)

Hi everybody!

I can't solve this apparently easy problem.

I have a string $str = "" and I'd like to convert it to "üß" or even better to "üß".

I tried
$str = htmlentities($str);

but all I get is: üü

What am I doing wrong?

 

lucy24




msg:4299900
 4:18 pm on Apr 18, 2011 (gmt 0)

I was going to say:

Your text was entered in UTF-8, with the two letters = C3BC and = C39F. It is being interpreted as ISO-Latin-1, giving the four letters C3 = , BC = , C3 = , 9F = ... whoops! where's that second coming from? You'd expect .

¼ and are different names for the same character. Are you quoting your actual output?

Somewhere in the bowels of your software there has to be a setting that lets you tell it the encoding of the original text. What happens if your original text includes characters that aren't in ISO-Latin-1?

fm86




msg:4300551
 5:40 pm on Apr 19, 2011 (gmt 0)

Hi and thanks for the reply!

Your right, the problem seems to be the encoding of the page. I sent a header information to say it was going to be XML and this was causing troubles. Now I tried to modify the code to be like this:

header ("Content-Type:text/xml");
print "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
$text = "";
die("<tag>$text</tag>");


But the characters are now shown as &#65535;&#65533;

Do you have any further suggestion?

lucy24




msg:4300567
 6:15 pm on Apr 19, 2011 (gmt 0)

Urk! Those are hex FFFF and FFFD, where the latter is the utf-8 "replacement character" meaning "I can't deal with this". As it happens, the characters and both occupy locations that are permitted in Latin-1 but not in UTF-8, in the 0080-009F range. So it sounds as if you have managed to turn the original problem on its head :-) That is, first you had UTF-8 characters being interpreted as Latin-1, and now you have Latin-1 characters being interpreted as UTF-8.

What is the encoding of your original file-- the one on your computer that you're looking at right now? If the file itself is in Latin-1, changing the HTML header to say UTF-8 (or vice versa, or any other permutation of encodings) will not change the text, it will simply make it display incorrectly. See what happens if you leave everything exactly the way it is, but change the "UTF-8" piece to "ISO-Latin-1" (or 8859-1 if that's what the software expects).

Disclaimer: I do not speak php, though I do know German ;-)

fm86




msg:4300604
 7:04 pm on Apr 19, 2011 (gmt 0)

Servus! :)

Sooo, it's very frustrating... Just to check, I changed the encode of my file to UTF-8 and I couldn't visualize the characters properly. No wonder, that means the file was originally Latin1. I tried to change the encoding of the XML but it didn't work out.

I somehow solved the issue using utf8_encode() but then if I run htmlentities() on the resulting string it's giving the &tilde again. Maybe it works only for latin? Guess I didn't get something very important about character encoding.

lucy24




msg:4300642
 7:50 pm on Apr 19, 2011 (gmt 0)

Can you take utf8_decode and either put that inside of the htmlentities command, or feed its result to htmlentities?

:: grasping at straws ::

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved