Forum Moderators: open

Message Too Old, No Replies

The page is UTF-8 but input fields show ISO-8859

Just changing all to UTF-8

         

jetteroheller

5:54 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the html head is
<meta http-equiv="content-type" content="text/html; charset=utf-8">

When I use in Google Chrome developer tools and enter in the console
document.characterSet
"UTF-8"

But input and textarea returns ISO-8859-1 encoded.
I tested this with

<input type=text name=test id=test value=test onchange=alert(escape(this.value)+'='+this.value.length)>

I entered 3 German special characters "äöü" (aou with 2 points above)
and the alert shows

%E4%F6%FC=3

This is the ISO-8859-1 encoding of "äöü".
Why does the alert not show the UTF-8 endocing?

Fotiman

6:04 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Short answer: the escape method will return the %xx value for those characters, regardless of document characterSet.

escape(this.value)


Note, escape has been deprecated and should not be used.
The hexadecimal form for characters, whose code unit value is 0xFF or less, is a two-digit escape sequence: %xx. For characters with a greater code unit, the four-digit format %uxxxx is used.

[developer.mozilla.org...]

Thus, the German characters will escape to the hex value.

jetteroheller

6:48 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Okay, what else should I use?

Fotiman

7:11 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What exactly are you trying to accomplish?

Fotiman

7:15 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you just want to alert the German characters, then don't attempt to encode it at all, just do:

<input type=text name=test id=test value=test onchange=alert(this.value+'='+this.value.length)>

That will alert the German characters that's what you entered.

jetteroheller

7:22 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The content of the input field has to be passed to the server.
So I think I have to encode it to be passed by

try{xhttp.send(id+'='+escape_plus(d))}

where d is the content of the field

Fotiman

7:51 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Then you want encodeURIComponent.
[developer.mozilla.org...]

brotherhood of LAN

9:24 pm on Aug 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the html head is
<meta http-equiv="content-type" content="text/html; charset=utf-8">


Worth noting that a Content-Type HTTP header will override any declaration made within the document.

Perhaps also relevant [w3schools.com...]

ergophobe

7:29 am on Dec 3, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just as a bit of a side note... It may be worth saying that there are multiple ways to represent diactriticals and therefore the counts can be different. It could be 2 or 3 bytes, depending on whether or not it's a composed character and which it is depends on your input device/software and your normalization scheme.

Twitter has a really nice rundown on the problem because character counting is sort of a big deal to them
[developer.twitter.com...]

lucy24

6:51 pm on Dec 3, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It could be 2 or 3 bytes

It could be one byte in a traditional encoding (Mac, Windows).

ergophobe

6:00 am on Dec 4, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But then you would know it wasn't UTF-8 and his test would fail. So that would answer his question