Forum Moderators: open
A friend of mine translated my website in chinese (i don't speak chinese) so it's kind of hard to do any SEO.
I'd like to ask you some questions:
1. My charset is UTF-8 but I see baidu, yahoo.cn ... are using gb2312. which one is better? and why?
2. when I try google.cn, google sends me to google.com. How can I test google.cn to check my website? A chinese proxy can prevent that?
Thanks
If you can reasonably predict that all of your site visitors will be using a certain browser (i.e., in an Intranet environment) then UTF-8 is not much of a gamble. Recent browser software doesn't have the display problems that have plagued Unicode with Asian languages. On the Internet in general it's tough to predict who is going to see your site and the software they'll be using. They could be surfing with a phone, a PDA or some other device that may not fully support Unicode. Even with modern software you really have to be careful that your input is correct as well.
If you want to be safe use the tried and true GB2312. UTF-8 is more of a benefit to the webmaster and can be problematic for the user to view.
I would tend to disagree with the previous response. UTF-8 is supported just fine by most modern databases. Something like 97% of Chinese users are using IE5/6, which support UTF-8 just fine.
However, it is true that most sites in Taiwan still use BIG5 and most sites in China still use GB. This is not a problem - Baidu will convert your charset to GB as it crawls.
The GB issue is mainly a legacy issue as many companies in China do not overhaul their site that often, and webmasters tend to learn one way of doing something and then never learn the new way because they're so overworked. Over time I'm sure UTF8 will catch on.
UTF8 also offer countless advantages to the webmaster since you no longer have to deal with charset conversion issues, it becomes relatively easy to take a database or html and port it from a GB site to a BIG5 site. Most editors default to UTF8 these days, PHP, Javascript, ASP etc all support UTF8 just fine. And you have the benefit of enabling mixed content - japanese, chinese, arabix, greek, etc for example all on the same page. Yes I know, I'm simplifying the process description, but it certainly removes some MAJOR issues. I'm facing just such an issue now, I have a huge 9 year old site which was coded half in GB and half in BIG5. Updating it is always a major pain. Of course updating to UTF8 will not be a walk in the park, but it will have to be done soon.
[edited by: Rendezvous at 2:54 pm (utc) on Aug. 5, 2006]
I would tend to disagree with the previous response. UTF-8 is supported just fine by most modern databases.
Something like 97% of Chinese users are using IE5/6, which support UTF-8 just fine.
Don't get me wrong. I like UTF-8. I have Chinese websites that use UTF-8. However, if you're going to use that charset then be aware that there are issues.
The Chinese govt may recommend GB, but that's a political, nationalistic stance, as is often the case in China. It certainly does not make it a fact that GB (in any of it's various flavours) is a better system.
I can show you screenshots of my top China and Taiwan sites in Google Analytics, it's shocking but I see 94% and 96.4% IE users respectively. As much as I hate IE, and despite the fact that my sites work equally fine in Firefox and Opera and IE, users in China and Taiwan are just not into switching browsers. The other 3~4% are ancient browsers, and I could care less about supporting them. I should mention that these are general news and matchmaking sites which attract quite a varied audience. I also have a tech related chinese site, and Firefox and Opera usage is appx 20%! BTW, since GA does not record hits from robots and crawlers, I am also using Sawmill to read the logs directly, and I'm seeing as much as 30~40% of my total traffic is coming from robots and crawlers. China seems to be infested with zombie spiders, many of them looking for qq.txt :)
I've only been interested in Baidu in the past couple months since they started referring heaps more users than Google, so I don't have a long experience with them. However, they seem to have no problem crawling and caching my UTF-8 sites, even though Baidu outputs in GB. In fact, I see BIG5 Taiwan sites are also listed and converted correctly. Further, the search and results functions seems to be working fine, so I don't seem to be experiencing ANY issues with Baidu. I'm curious what issues you or others are having.
I AM seeing one issue with Slurp crawler, which appears to be getting links to my site from Baidu and then passing the GB query strings directly to my site without converting to UTF8 first. I had to add a GB->UTF8 routine just to make Slurp happy.
But I guess all this is what makes building sites and SEO in Asia much more interesting than elsewhere!
[edited by: Rendezvous at 3:09 pm (utc) on Aug. 7, 2006]
if so baidu will put your site into spamming, never have a good position!
above is my opinion. hope can help you!