Forum Moderators: open

Message Too Old, No Replies

Some general questions about chinese websites

         

Shenron

6:13 pm on Aug 3, 2006 (gmt 0)

10+ Year Member



Hello,

A friend of mine translated my website in chinese (i don't speak chinese) so it's kind of hard to do any SEO.

I'd like to ask you some questions:
1. My charset is UTF-8 but I see baidu, yahoo.cn ... are using gb2312. which one is better? and why?

2. when I try google.cn, google sends me to google.com. How can I test google.cn to check my website? A chinese proxy can prevent that?

Thanks

bill

1:16 am on Aug 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UTF-8 can be problematic with Asian languages particularly if you're working with a CMS or database. The recommended encoding for Simplified Chinese is GB2312. You can be assured that the widest possible audience can view your content with this charset encoding.

If you can reasonably predict that all of your site visitors will be using a certain browser (i.e., in an Intranet environment) then UTF-8 is not much of a gamble. Recent browser software doesn't have the display problems that have plagued Unicode with Asian languages. On the Internet in general it's tough to predict who is going to see your site and the software they'll be using. They could be surfing with a phone, a PDA or some other device that may not fully support Unicode. Even with modern software you really have to be careful that your input is correct as well.

If you want to be safe use the tried and true GB2312. UTF-8 is more of a benefit to the webmaster and can be problematic for the user to view.

Rendezvous

2:44 pm on Aug 5, 2006 (gmt 0)

10+ Year Member



I've been designing chinese websites since 1995, back when you had to insert a space between each character just to get the text to wrap. :P

I would tend to disagree with the previous response. UTF-8 is supported just fine by most modern databases. Something like 97% of Chinese users are using IE5/6, which support UTF-8 just fine.

However, it is true that most sites in Taiwan still use BIG5 and most sites in China still use GB. This is not a problem - Baidu will convert your charset to GB as it crawls.

The GB issue is mainly a legacy issue as many companies in China do not overhaul their site that often, and webmasters tend to learn one way of doing something and then never learn the new way because they're so overworked. Over time I'm sure UTF8 will catch on.

UTF8 also offer countless advantages to the webmaster since you no longer have to deal with charset conversion issues, it becomes relatively easy to take a database or html and port it from a GB site to a BIG5 site. Most editors default to UTF8 these days, PHP, Javascript, ASP etc all support UTF8 just fine. And you have the benefit of enabling mixed content - japanese, chinese, arabix, greek, etc for example all on the same page. Yes I know, I'm simplifying the process description, but it certainly removes some MAJOR issues. I'm facing just such an issue now, I have a huge 9 year old site which was coded half in GB and half in BIG5. Updating it is always a major pain. Of course updating to UTF8 will not be a walk in the park, but it will have to be done soon.

[edited by: Rendezvous at 2:54 pm (utc) on Aug. 5, 2006]

bill

4:12 am on Aug 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Rendezvous welcome to WebmasterWorld. Good to have you with us.

I would tend to disagree with the previous response. UTF-8 is supported just fine by most modern databases.

I never said that UTF-8 wasn't supported by databases, but rather problematic. If the webmaster doesn't know what they're doing you can end up with illegible characters displaying on your website, or worse, illegible form data submitted by your customers.

Something like 97% of Chinese users are using IE5/6, which support UTF-8 just fine.

I'm not sure where you're getting the 97% figure, but my sites don't show IE quite that high. Sure IE has a majority share, but when you're designing professional sites it's often wise to look at the accessibility of everyone accessing your site. As I mentioned above...it's not always people on a PC accessing your site. It's not always people using a browser. It could be a robot or spider, which are notorious for their inabilities to handle all sorts of things. To the best of my knowledge the Chinese government still recommends GB2312 and that has been the safest charset recommendation for quite some time now.

Don't get me wrong. I like UTF-8. I have Chinese websites that use UTF-8. However, if you're going to use that charset then be aware that there are issues.

Rendezvous

3:08 pm on Aug 7, 2006 (gmt 0)

10+ Year Member



Hi Bill, yes, I misread your statement. However, in the case of MSSQL databases, it's as simple as creating a N column (such as Nvarchar) and prefixing any statements with N (such as INSERT N'my text'). If a webmaster has difficulty with that, such as working with a off-the-shelf CMS not designed for UTF8, then yeah, find a different solution, be it GB or whatever.

The Chinese govt may recommend GB, but that's a political, nationalistic stance, as is often the case in China. It certainly does not make it a fact that GB (in any of it's various flavours) is a better system.

I can show you screenshots of my top China and Taiwan sites in Google Analytics, it's shocking but I see 94% and 96.4% IE users respectively. As much as I hate IE, and despite the fact that my sites work equally fine in Firefox and Opera and IE, users in China and Taiwan are just not into switching browsers. The other 3~4% are ancient browsers, and I could care less about supporting them. I should mention that these are general news and matchmaking sites which attract quite a varied audience. I also have a tech related chinese site, and Firefox and Opera usage is appx 20%! BTW, since GA does not record hits from robots and crawlers, I am also using Sawmill to read the logs directly, and I'm seeing as much as 30~40% of my total traffic is coming from robots and crawlers. China seems to be infested with zombie spiders, many of them looking for qq.txt :)

I've only been interested in Baidu in the past couple months since they started referring heaps more users than Google, so I don't have a long experience with them. However, they seem to have no problem crawling and caching my UTF-8 sites, even though Baidu outputs in GB. In fact, I see BIG5 Taiwan sites are also listed and converted correctly. Further, the search and results functions seems to be working fine, so I don't seem to be experiencing ANY issues with Baidu. I'm curious what issues you or others are having.

I AM seeing one issue with Slurp crawler, which appears to be getting links to my site from Baidu and then passing the GB query strings directly to my site without converting to UTF8 first. I had to add a GB->UTF8 routine just to make Slurp happy.

But I guess all this is what makes building sites and SEO in Asia much more interesting than elsewhere!

[edited by: Rendezvous at 3:09 pm (utc) on Aug. 7, 2006]

echolu

9:16 am on Aug 23, 2006 (gmt 0)

10+ Year Member



i am chinese, we all use gb2312 in china, i think it will be better. and most of chinese like search with baidu and google, i think you know how to seo your website in google, it is the same ways with google.com, but to baidu, it is different way, two points you will pay attention: 1. don't put keyword list in your title, baidu like it if you put your company's name and your service (include one time keyword).
2. don't put keyword at the top of your pages.

if so baidu will put your site into spamming, never have a good position!

above is my opinion. hope can help you!