Welcome to WebmasterWorld Guest from 3.93.74.227

Forum Moderators: open

Message Too Old, No Replies

The page is UTF-8 but input fields show ISO-8859

Just changing all to UTF-8

     
5:54 pm on Aug 23, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member jetteroheller is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 22, 2005
posts: 3062
votes: 6


In the html head is
<meta http-equiv="content-type" content="text/html; charset=utf-8">

When I use in Google Chrome developer tools and enter in the console
document.characterSet
"UTF-8"

But input and textarea returns ISO-8859-1 encoded.
I tested this with

<input type=text name=test id=test value=test onchange=alert(escape(this.value)+'='+this.value.length)>

I entered 3 German special characters "" (aou with 2 points above)
and the alert shows

%E4%F6%FC=3

This is the ISO-8859-1 encoding of "".
Why does the alert not show the UTF-8 endocing?
6:04 pm on Aug 23, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


Short answer: the escape method will return the %xx value for those characters, regardless of document characterSet.

escape(this.value)


Note, escape has been deprecated and should not be used.
The hexadecimal form for characters, whose code unit value is 0xFF or less, is a two-digit escape sequence: %xx. For characters with a greater code unit, the four-digit format %uxxxx is used.

[developer.mozilla.org...]

Thus, the German characters will escape to the hex value.
6:48 pm on Aug 23, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member jetteroheller is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 22, 2005
posts: 3062
votes: 6


Okay, what else should I use?
7:11 pm on Aug 23, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


What exactly are you trying to accomplish?
7:15 pm on Aug 23, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


If you just want to alert the German characters, then don't attempt to encode it at all, just do:

<input type=text name=test id=test value=test onchange=alert(this.value+'='+this.value.length)>

That will alert the German characters that's what you entered.
7:22 pm on Aug 23, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member jetteroheller is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 22, 2005
posts: 3062
votes: 6


The content of the input field has to be passed to the server.
So I think I have to encode it to be passed by

try{xhttp.send(id+'='+escape_plus(d))}

where d is the content of the field
7:51 pm on Aug 23, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


Then you want encodeURIComponent.
[developer.mozilla.org...]
9:24 pm on Aug 23, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:5046
votes: 60


In the html head is
<meta http-equiv="content-type" content="text/html; charset=utf-8">


Worth noting that a Content-Type HTTP header will override any declaration made within the document.

Perhaps also relevant [w3schools.com...]
7:29 am on Dec 3, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8639
votes: 287


Just as a bit of a side note... It may be worth saying that there are multiple ways to represent diactriticals and therefore the counts can be different. It could be 2 or 3 bytes, depending on whether or not it's a composed character and which it is depends on your input device/software and your normalization scheme.

Twitter has a really nice rundown on the problem because character counting is sort of a big deal to them
[developer.twitter.com...]
6:51 pm on Dec 3, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15937
votes: 889


It could be 2 or 3 bytes

It could be one byte in a traditional encoding (Mac, Windows).
6:00 am on Dec 4, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8639
votes: 287


But then you would know it wasn't UTF-8 and his test would fail. So that would answer his question