Is Baidu staying away from UTF-8 encoded Chinese sites?
Does Baidu prefer GB2312 encoding?
chal00d
7:22 am on Nov 7, 2008 (gmt 0)
This has been touched on within the forums, but I was hoping to get some feedback from people's experiences with Chinese language character encoding; specifically in relation Baidu.
Has anyone out there tested the theory that Baiduspider pays more attention to sites using GB2312 than UTF-8?
bill
8:11 am on Nov 7, 2008 (gmt 0)
I haven't read of any significant testing on a level that would be definitive. My GB2312 sites have migrated to UTF-8 and they do fine in Baidu.
UTF-8 is the default standard for a lot of CMS and blog packages so I doubt it would be penalized by Baidu.
What have your observations been?
chal00d
8:46 am on Nov 7, 2008 (gmt 0)
My experience (with charset UTF-8) is that baiduspider typically visits around 20 pages and leaves - compared with page counts numbering in the thousands from Yahoo Slurp, Googlebot and others. So really just trying to eliminate any potential roadblocks right now...
I see the majority of 'popular' Chinese language sites sticking with GB2312 and heard mention it could be favoured, though using UTF-8 doesn't appear to be an issue for the fewer sites I've found using it.
bill
8:55 am on Nov 7, 2008 (gmt 0)
Slurp and Googlebot can go overboard at times on some sites, but Baiduspider is the top for my Chinese language UTF-8 sites in terms of activity. In fact, Baiduspider is more than twice as active compared to Yahoo Slurp (#2).
chal00d
9:09 am on Nov 7, 2008 (gmt 0)
I'd love to be seeing that, though right now Baiduspider is #15 in terms of crawling activity, hence the red flag.
Thanks for the comments Bill, if encoding's not the issue I'll continue looking...