Forum Moderators: rogerd & travelin cat

Message Too Old, No Replies

WordPress character encoding

         

smallcompany

8:23 am on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder about what can influence/determine how WordPress shows text.

Here is what I have:

Page: <meta charset="UTF-8">

Code in both visual and text in WP is:

You're nice.
Hi - hey


When I look at it in the database (in phpMyAdmin), it's still like in WP. The collation in MySQL is utf8_general_ci.

Yet, when I look into the source code in any browser, I get this:

You&#8217;re nice.
Hi &#8211; hey



Why is this? What overrides the UTF-8 setting?
I tried different themes, no change.

The settings in WP-Config are:

define('DB_CHARSET', 'utf8');
define('DB_COLLATE', ''); (tried changing this to utf8_general_ci, no help)


Thanks

lucy24

8:38 am on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's a very recent post by not2easy that explains how WordPress does this. (She'll know where to find it; I don't think it was in the WP subforum.)

The short version is that everything in the database gets converted to decimal html entities, which will display as intended in all browsers everywhere, regardless of charset. You could change the meta charset line to anything else, or omit it entirely, and there would be no effect in the displayed text.

But darn, I do wish they'd used hexadecimal entities instead. I remember an unrelated post from, hm, maybe a few months ago. The upshot was that mobiles are happier with hexadecimal than decimal entities.

Give WordPress a few more years and maybe they'll figure out that once the page is set to UTF-8, there's no reason to clutter the code with entities. If there are too many of them, they do start affecting page size (2-4 bytes for each higher-range utf-8 character, vs. 7-8 for a numerical entity).

not2easy

4:13 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sorry, I didn't keep notes on that previous discussion but there are a few reasons for WP doing this: one is an effort to prevent sql injection attempts (in posts for example, or URL requests) and the other is because WP wants to be absolute-newb friendly. People who don't have much experience with the nuances of charsets, text and html will often use formatted text such as from MSWord to paste into posts. IF they paste formatted text into the 'Visual Editor' it gets converted into html entities right in the editor where you can see it. IF you use the 'html editor' results are less predictable. You don't see them on the page until you view the source code.

If you are trying to enter actual html coding or css and don't want it converted visually in the text it needs to be surrounded with <code> tags which uses a monospace font and will eliminate all styling. You can use a text editor that lets you select the encoding for your text and stick to pasting in UTF-8 text to get around some of the issues. The example I mentioned as a bad choice to use for a "plain text" editor (MSWord) uses encoding with entities that visually look like different characters (' and - for example).

For Windows, a free text editor like Notepad++ or PSPad is a better choice, it lets you set the text encoding. Free TextWrangler for Mac offers that too.

smallcompany

4:34 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks very much to both.

I still don't get it, and here is why:

I have other WP installations where it’s is it’s in both browser and source code (the way I want it).

How come?

I compared three different WP sites, on two servers. Only this latest one does this. All the installations are WP 4.0 now.

Copy/paste from Notepad++ set to UTF-8 does not change anything. This happens even when I type.

P.S.
I'm not trying to show the coding, just plain text.

not2easy

5:35 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Are you pasting it into the visual editor or the html editor? Is the server's sql configured for UTF-8? You can't change that with WP config, it would need to be changed where the server settings are. If you go to PHPMyAdmin, it will show you how the sql tables are configured.

smallcompany

8:02 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you pasting it into the visual editor or the html editor?

I tried both pasting and typing in. No difference.

Is the server's sql configured for UTF-8?

utf8_general_ci

lucy24

9:47 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You know how when you were young, people would enrage you by saying "Growing up is the best revenge"? Rolling your own HTML will let you do anything you want, the way you want it ;) When you use a CMS, there's always a tradeoff.

smallcompany

10:15 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Rolling your own HTML will let you do anything you want, the way you want it ;) When you use a CMS, there's always a tradeoff

Absolute true. I wanted to get a feeling how would it look like if I move an existing static HTML site into WP installation that is responsive/mobile friendly. I may continue with my test in order to learn things, but will do another development at my own as well.
The "shortcuts" in this CMS example usually take more of my time than if I just started from scratch.

Cheers and thanks

not2easy

10:31 pm on Nov 16, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



On the question about pasting in, what I was asking was about which interface (Visual or HTML) in the text editor. I don't know if typing vs. pasting makes a difference, but which interface can make a difference.

It looks like the sql settings are not the issue.