Welcome to WebmasterWorld Guest from 34.237.76.249

Forum Moderators: open

Message Too Old, No Replies

Multiple language website

     
9:35 am on May 2, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Hi

I'm working on a website for a friend. He wants the site to be available in multiple languages.

I set up an SQL table called languages with two fields the first field is the language code (EN, TH etc.) and the second contains the wording in that foreign language (English, ภาษาไทย). The second field is datatype nvarchar(max). When I look at the content of the table in SQL Server Management Studio everything looks fine (various languages are displayed correctly).

The problem is that, when I extract the language data from the table and try and display them in a page, the foreign languages come out as ?

I've Googled until my fingers are numb testing various suggestions concerning the doctype and utf-8 and many, many other things but I cannot get the languages to display correctly in any screen.

I was wondering if anyone out there has accomplished the above (multiple languages on a page) and can paste a simple HTML example (<html> down to </html>) that I can use to finally get passed this problem.

Thanks, fingers crossed,

Mick
9:37 am on May 2, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Quick note. When I entered the above question the &#3616; etc. were actually displayed as Thai characters.
10:47 am on May 2, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
posts:1551
votes: 10


Make sure that all components along the processing path are using the same text encoding (most commonly utf-8).

Since your db monitor apparently gives you readable output, there are stil two places where things can go wrong:
  • Some code between database and web output tries to convert the encoding while it shouldn't.
  • The declared charset of the web page doesn't match the encoding of your data.
5:13 pm on May 2, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15752
votes: 825


You've omitted one fairly crucial bit of information: what charset does the page HTML currently name?

the foreign languages come out as ?

Did you mean ? (a simple question mark) or did you mean a question mark inside a black diamond? This is important, because it tells us what encoding your text reader (in this case, the browser) is trying to use. Plain ? tends to mean a non-displaying character in Latin-1, while the black-diamond form is the same in UTF-8. A steady stream of ? is odd, though. For Thai you'd expect to see a sequence of

:: shuffling papers ::

x and x where "x" could be anything.

the &#3616; etc. were actually displayed as Thai characters

:: pause for dirty look directed at various That Be who have intentionally chosen not to make this extremely simple fix ::
10:52 am on May 3, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Hi guys, thanks for the feedback.

From what I can make out, there is something in the CSS of the theme I am using. I moved the <meta charset="utf-8"> to below the CSS includes and the languages now display correctly.
7:08 pm on May 3, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15752
votes: 825


Oh, criminy, that will technically work but it's a horrible longterm fix. It means your stylesheets are getting read in a different character set than your html ... but honestly I don't see why it would make any difference, because how many non-ASCII characters are even in the stylesheet? *

If you've got the time to investigate further, I'm really curious. Meanwhile, make sure each page's <title> comes after the charset declaration if you're using anything other than ASCII and entities (which would be messy and hard-to-edit in a non-Roman script). Don't rely on it displaying correctly in your own browser alone; you may just happen to have the same default.


* afaik, I've got one, in a directory-specific "content" declaration.
6:00 am on May 4, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
posts:1551
votes: 10


There's a @charset directive for use in CSS files, but according to the specification it is supposed to only declare the encoding of the CSS file itself (relevant for file names, font names, ::before/::after content, etc.). If that actually influenced how the HTML is displayed, then we'd be looking at a rather nasty (and unlikely) browser bug.

More likely: Does your theme also use Javascript? It's much more easy with that to do dumb things to the HTML document, such as inserting another meta charset header or whatnot.
12:39 pm on May 4, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Groan, I jumped the gun there!

It seems like the server rebooted itself last night (maybe a microsoft update) - no more Thai language characters today. Not sure if my moving the charset actually did anything (although there was Thai characters for a while!).

The head declaration is like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en">
<head>
<meta name="viewport" content="width=device-width,initial-scale=1.0">
<link href="css/Rstyle.css" rel="stylesheet">
<meta charset="utf-8">
<title>My Title</title>
</head>

Then further down in the page it just uses the <%=Prompts(PPos)%> to display the loaded translations.

Lucy: I'm not doing anything tricky. Just loading from the SQL database and (trying to) display the text on the screen
Bird: No Jacascript

Confused.
1:56 pm on May 4, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


Your charset declaration should be the very first meta tag inside your <head>. A couple things to look at:
1. What HTTP headers is your server sending? Look to see if the server is sending different charset information in the HTTP headers that might conflict with your meta tag.
2. If you view the raw data that the browser is sending (try using Fiddler to examine the raw data), does it look correct? In other words, is the browser just not displaying that data correctly?
2:37 pm on May 4, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


I should quickly point out that I am based in Thailand (hence the urgent requirement) so apologies for delays in responding.
3:11 pm on May 4, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Hi Fotiman. Just to say that I moved the charset to below the <head> tag. No Thai characters (you probably guessed that).
3:26 pm on May 4, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15752
votes: 825


<link href="css/Rstyle.css" rel="stylesheet">
<meta charset="utf-8">

The more I look at this section, the less I like it. (Incidentally, is "meta charset" valid XHTML? I thought the shorter form was only for HTML 5.)

If you look it up [w3.org], you'll find a long list of how file encoding for a stylesheet is determined. The default or fallback is: use whatever the HTML uses. (On w3's list the absolute default is UTF-8, but honestly I don't trust you-know-which browser to honor this.) If the stylesheet is referenced before the HTML's charset declaration, that would almost have to mean that the stylesheet's encoding matches the browser default ... and that, of course, is entirely up to the user. Note too that the server itself can set charsets. You said "microsoft" which to me implies IIS; there's a subforum for that.

I think you need to put your charset declaration back at the beginning of the html. It doesn't matter if it's before or after unrelated metas like "viewport", but there shouldn't be any other content before the charset is established. Once you're certain that you are sending the same information to all browsers, then we can home in on the issue.

Do the stylesheets themselves contain any non-ASCII characters? Do they, in turn, reference any text content? (I'm having a hard time picturing how, because normally a stylesheet wouldn't call on anything further except maybe fonts and images, but I may be overlooking something obvious.)

I tend to suspect that the whole stylesheet business is a red herring and the problem lies somewhere else. I seriously doubt it's an HTML issue, though. I would expect something with the database.
6:56 am on May 5, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Hello again Lucy

That's another great response.

I've moved the charset to the top of the <head>.

Even if I remove the CSS include the languages are not displayed correctly, they're displayed as ?.

I agree that the database is looking suspect but when I look at the data in the table everything is displayed correctly.

I can't believe that something that should be so simple is stopping me from progressing the project...
10:34 am on May 5, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


OK, that was taking way too long with no end in sight.

I ended up creating text files with the various languages and saving those files in UTF-8 format. All displays correctly now.

Not the ideal solution but now I can get on with my work!

Thanks all.
3:56 pm on May 5, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Sorry, back again.

I overlooked the fact that our Thai customers will want to enter their name, address and other details in the Thai language (as well as potential Cambodian and other neighbouring countries).

I really need to sort out this problem.

I guess the fact that when I load the Thai character set from a text file then the existing code works and displays the correct characters points to the fact that the issue may be with the database configuration?

The database table uses nvarchar and the database collation is set to SQL_Latin1_general_CP1_CI_AS - are there any other SQL settings that I have missed?

Help :(
4:19 pm on May 5, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 17, 2005
posts: 5021
votes: 26


What sort of server stack are you using? For example, if you're developing an ASP site, then you would be better off using Resources for localization (https://msdn.microsoft.com/en-us/library/vstudio/fw69ke6f%28v=vs.100%29.aspx). For PHP, maybe gettext (http://php.net/manual/en/function.gettext.php).
12:22 pm on May 6, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:May 23, 2002
posts: 446
votes: 0


Sorted. I post the following solution in case someone else searching for a solution to this problem finds this thread.

At the top of the code put the following:

Response.ContentType = "text/html"
Response.AddHeader "Content-Type", "text/html;charset=UTF-8"
Response.CodePage = 65001
Response.CharSet = "UTF-8"

Everything displays in all the various foreign (to me!) languages.

bye.