Forum Moderators: open

Message Too Old, No Replies

PostgreSQL 8.1.4 UTF8 encoding error

         

windspinner

9:09 am on Aug 23, 2006 (gmt 0)

10+ Year Member



Hello, folks. This is the first time I installed PostgreSQL 8.1.4 on a UNIX machine, so I used rpms for that matter. Now, I got it running and all, but it seems like I screwed up installing the rpms.

When I am reading dumps, I am having trouble processing characters like "é". The only thing I've installed before I started the server were the following packages: libs, server, the engine itself, odbc, and jdbc. On Windows, when I didn't specify the encoding, it uses UTF8 and it does not have any problem when I encounter special characters like that "accented e." But in my LINUX box, I have to specify the encoding to KOI8R in order for my db to be able to read special characters like that.

I think I have wrongly installed postgresql. I've installed other rpms from Postgres's site, but it didn't work when I create a new db, even after I restarted the server. Anyone who's got an idea? Any help?

Here's the error I'm getting by the way:
ERROR: invalid byte sequence for encoding "UTF8": 0xe9273b

zCat

10:14 am on Aug 23, 2006 (gmt 0)

10+ Year Member



If your database is created UTF-8 and your data is in KOI8R, but your OS environment is UTF-8 (common in recent Linux distributions), you'll need to specifiy KOI8R as the encoding used by the client software (e.g. psql) which is importing the data. Then PostgreSQL should be able to convert it properly. If you don't set the client encoding to match the data encoding, PostgreSQL will assume it's UTF-8 but will fail with the kind of errors you've got.

windspinner

2:02 am on Aug 25, 2006 (gmt 0)

10+ Year Member



Thanks, zCat. I figured out that I am asking the wrong question. I thought that UTF8 can read all sorts of characters based from what I read in PostgreSQL's site, but apparently, I read it wrongly.

The default encoding my postgres uses is UTF8, what if I want it to use SQL_ASCII instead of UTF8? Is there any way to do it after your initdb phase (database initialization)?

zCat

4:26 am on Aug 25, 2006 (gmt 0)

10+ Year Member



Thanks, zCat. I figured out that I am asking the wrong question. I thought that UTF8 can read all sorts of characters based from what I read in PostgreSQL's site, but apparently, I read it wrongly.

Well, it has no problem storing Unicode characters of course... (I have one database which contains text in English, various European languages, Japanese and Chinese). You just need to make sure they are converted properly when being input into the database. PostgreSQL can usually do that for you if you tell it what encoding it's getting.

The default encoding my postgres uses is UTF8, what if I want it to use SQL_ASCII instead of UTF8? Is there any way to do it after your initdb phase (database initialization)?

No, you'd have to dump the entire database and reload it into a newly created database.

windspinner

4:48 am on Aug 25, 2006 (gmt 0)

10+ Year Member



Thanks again. So there is no other way then but to dump.

One of my DBs is in UTF8 encoding--the default encoding for my PostgreSQL 8.1.4--and when I execute the query "SELECT 'café'", I get that UTF8 error. Is that supposed to happen, is there anyway to avoid that?

zCat

6:49 am on Aug 25, 2006 (gmt 0)

10+ Year Member



It should be possible to avoid the error. In what context are you executing the statement (e.g. in the psql command line program, or via PHP / Java / whatever code)?

windspinner

7:25 am on Aug 25, 2006 (gmt 0)

10+ Year Member



In EMS SQL Manager, in the command line, and in a ColdFusion program. I guess there are various ways, through syntax, to avoid the error, but I am thinking of avoiding all the trouble and create a database with a different encoding other than UTF8, preferrably SQL_ASCII.

Anyway, if I am updating my PostgreSQL in Linux, does that mean that I need to create a dump and uninstall all my rpms? zCat, I also am trying to switch from Postgres 7.2.1 / 7.3 to 8.1.4, and I noticed that when you testing a blank value (intFieldName='') against an integer datatype would give you an error, as opposed to 7.2.1 which justs processes the query all right. There's so many release notes in Postgres's site, I don't know where to start seeking, when was this tighter datatype security implemented?

zCat

7:35 am on Aug 25, 2006 (gmt 0)

10+ Year Member



In EMS SQL Manager and in the command line.

On the command line (psql) you need to

SET client_encoding TO whatever
, "whatever" being the appropriate encoding for your environment, e.g. KOI8R or LATIN1.

Anyway, if I am updating my PostgreSQL in Linux, does that mean that I need to create a dump and uninstall all my rpms?

If you are upgrading between major versions (e.g. 8.0 to 8.1) you need to dump the database then upgrade the RPMs; between minor versions (e.g. 8.0.3 to 8.0.7) you just need to upgrade the RPMs.

windspinner

7:44 am on Aug 25, 2006 (gmt 0)

10+ Year Member



OK, thanks.