homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
content encoding/decoding issue
Asian characters
LifeinAsia




msg:4516411
 6:33 pm on Nov 6, 2012 (gmt 0)

I have some content that users have entered through different forms. It's multilingual and sometimes there's been a mismatch with character formatting.

In one example, the Koreanization of "Allstate" was entered as "올스테이트" instead of in euc-kr formatting (which the rest of the site uses).

Although in the end it seems to display okay in browsers, I need to make it searchable within the DB.

How can I convert strings like this into euc-kr (or ideally, UTF-8)?

 

lucy24




msg:4516534
 9:45 pm on Nov 6, 2012 (gmt 0)

What language are we talking about? (The answer is not "Korean, you doofus, didn't I say that?" ;) ) The database itself doesn't do the work; it just stores what you give it. Different languages have different decoding commands. Sometimes more than one per language.

For comparison purposes: when I added full-spectrum percent decoding to my log-wrangling routine, which uses javascript, it turned out that you have to use one function for ASCII characters and a completely different one for the higher ranges. This makes no sense-- but there you are.

If it's any consolation, occasional visits to the in-progress areas of Google In Your Language suggest that even they haven't got it figured out yet :)

LifeinAsia




msg:4516543
 9:56 pm on Nov 6, 2012 (gmt 0)

ColdFusion (version 9). I've tried CF's encode/decode functions, but there is no change.

vincevincevince




msg:4516620
 3:13 am on Nov 7, 2012 (gmt 0)

Generally, the text entered through the page will match the character set of the page. Make sure you are explicitly defining the character set in the headers and/or the HEAD section of the page.

lucy24




msg:4516696
 7:05 am on Nov 7, 2012 (gmt 0)

I have some content that users have entered through different forms.

Missed this bit. You mean different physical forms, as in <form>? Or informally, as in "in different ways"?

Are the information-entering points all on your site? All on the same page? How many different forms? You have to assume that your users aren't literally typing &#1400; or equivalent. So something is happening to some of your forms that isn't happening to the others. The solution may turn out to involve doing less rather than doing more :)

I've tried CF's encode/decode functions, but there is no change.

Are you looking at the raw output or what ends up on your screen? I don't speak Cold Fusion but my experience with JavaScript suggests that there may be more than one encode/decode pair, and you have to find the right one.

LifeinAsia




msg:4516830
 5:54 pm on Nov 7, 2012 (gmt 0)

As far as how the data got there in the first place, it was because of the wrong character set on the input page. Fixed that (although there may be more).

For this particular issue, I was able to fix it manually by pulling the text from the DB into a TEXTAREA in a form with the correct character set and submitting to an update page.

Are you looking at the raw output or what ends up on your screen?

Yup- raw output.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved