Welcome to WebmasterWorld Guest from 34.229.126.29

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Detect Text Encoding and Normalizing Text

     
11:02 am on Aug 29, 2014 (gmt 0)

Full Member

5+ Year Member Top Contributors Of The Month

joined:Sept 30, 2009
posts:227
votes: 1


I am inserting text into my database with text typed from a smartphone. There are special characters users are typing so my database and connection are setup to use utf-8.

I'm finding in phpMyAdmin for a few users text is showing up as:

0a496e746f20776f726b696e67206f75742c2068697020686f702c20747261702c205226422c20726f636b2c207261702c202e[...]

But when I edit it the text displays as (with characters that look slightly off):

Into working out, hip hop, trap, R&B, rock, rap, intellectuals, health enthusiasts[...]

I'd like to normalize this data before inserting it into my database (while at the same time still support any special characters used on smartphones like emoji).

I tried to first see how this is encoded by using mb_detect_encoding but everything is being listed as ASCII.

I don't know how to start this project and I would appreciate any help. Thanks.
1:22 pm on Aug 29, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 15, 2004
posts:942
votes: 0


Is there any scenario where the saved data must return to their original source? for example input from smartphones send back to that phone?

As for the ascii - they are already utf8 valid data.
If you are still want to convert them you can try something like:

$new_data = iconv('ASCII', 'UTF-8//IGNORE', $original_data);

(The IGNORE will discard any invalid characters just in case some were not valid ASCII.)
9:42 pm on Aug 30, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15636
votes: 795


But when I edit it the text displays as (with characters that look slightly off)

Unfortunately the WebmasterWorld forums themselves are constrained to Latin-1 (8859-1 with an outside option on 1252). So anything illustrated by the copy-and-paste got lost here :( (I toggled my browser's encoding manually. Nothing extra showed up.)

If your Smartphone users are typing emoji* then they are already in utf-8. Your challenge is to make sure that nothing ever gets changed from utf-8 to something else.


* Range added in, I think, Unicode Revision 5 that Apple appears to be irrationally in love with. Seriously annoying to people who expect print to display as print.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members