Forum Moderators: coopster

Message Too Old, No Replies

Outputting ISO-8859-1 database info to a UTF-8 page

Converting character sets

         

soulsizzle

4:33 pm on Apr 7, 2010 (gmt 0)

10+ Year Member



I have a database whose information is stored in ISO-8859-1 format. However, the final output that goes to the browser is encoded in UTF-8. My code currently is something along the lines of (very simplified):


$query = "SELECT id, field FROM table";
$result = odbc_execute($result);

while ($entry = odbc_fetch_array) {
$output = utf8_encode( $entry['field'] );
echo $output;
}


However, using something similar to what I have above causes a number of characters, mostly punctuation, to "disappear". What's the best may to ensure nothing gets lost in translation?

jatar_k

4:38 pm on Apr 7, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



how are they being input? using a form with a 8859-1 header?

you need to change the input page to be utf-8, that will help but if you still get junk then the db needs a charset switch.

it also could be an issue with the column type, I recently was forced to learn about nvarchar.

soulsizzle

4:56 pm on Apr 7, 2010 (gmt 0)

10+ Year Member



Unfortunately, I have no control over the database. It's a legacy left over from my clients old webpage.

jatar_k

5:35 pm on Apr 7, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



what languages are you storing?
where are you noticing the issue?
are these characters that are correctly stored in db?

if the db is 8859-1 and the data is stored correctly then why do you need to change to utf-8?

soulsizzle

8:42 am on Apr 9, 2010 (gmt 0)

10+ Year Member



English only.

The characters are stored correctly and display correctly when outputting as-is within a webpage also encoded in 8859-1.

However, it is only one of many data sources, and the least important. Any other data source I am using is encoded UTF-8. Therefore, the site I am developing outputs UTF-8 encoded HTML. If I do not try to re-encode, many of the characters are displayed incorrectly (as black diamonds with question marks in FireFox).

jatar_k

1:04 pm on Apr 9, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



anything in the range below 128 is standard so all those chars should be fine, anything above that is subject to differences in mapping.

If it has to stay as 8859-1 in the db then you could maybe store entities but this could cause problems if the content is going anywhere but a browser.