Forum Moderators: coopster

Message Too Old, No Replies

Detect Text Encoding and Normalizing Text

         

ocon

11:02 am on Aug 29, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



I am inserting text into my database with text typed from a smartphone. There are special characters users are typing so my database and connection are setup to use utf-8.

I'm finding in phpMyAdmin for a few users text is showing up as:

0a496e746f20776f726b696e67206f75742c2068697020686f702c20747261702c205226422c20726f636b2c207261702c202e[...]

But when I edit it the text displays as (with characters that look slightly off):

Into working out, hip hop, trap, R&B, rock, rap, intellectuals, health enthusiasts[...]

I'd like to normalize this data before inserting it into my database (while at the same time still support any special characters used on smartphones like emoji).

I tried to first see how this is encoded by using mb_detect_encoding but everything is being listed as ASCII.

I don't know how to start this project and I would appreciate any help. Thanks.

omoutop

1:22 pm on Aug 29, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there any scenario where the saved data must return to their original source? for example input from smartphones send back to that phone?

As for the ascii - they are already utf8 valid data.
If you are still want to convert them you can try something like:

$new_data = iconv('ASCII', 'UTF-8//IGNORE', $original_data);

(The IGNORE will discard any invalid characters just in case some were not valid ASCII.)

lucy24

9:42 pm on Aug 30, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But when I edit it the text displays as (with characters that look slightly off)

Unfortunately the WebmasterWorld forums themselves are constrained to Latin-1 (8859-1 with an outside option on 1252). So anything illustrated by the copy-and-paste got lost here :( (I toggled my browser's encoding manually. Nothing extra showed up.)

If your Smartphone users are typing emoji* then they are already in utf-8. Your challenge is to make sure that nothing ever gets changed from utf-8 to something else.


* Range added in, I think, Unicode Revision 5 that Apple appears to be irrationally in love with. Seriously annoying to people who expect print to display as print.