Forum Moderators: open

Message Too Old, No Replies

Is Baidu staying away from UTF-8 encoded Chinese sites?

Does Baidu prefer GB2312 encoding?

         

chal00d

7:22 am on Nov 7, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



This has been touched on within the forums, but I was hoping to get some feedback from people's experiences with Chinese language character encoding; specifically in relation Baidu.

Has anyone out there tested the theory that Baiduspider pays more attention to sites using GB2312 than UTF-8?

bill

8:11 am on Nov 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't read of any significant testing on a level that would be definitive. My GB2312 sites have migrated to UTF-8 and they do fine in Baidu.

UTF-8 is the default standard for a lot of CMS and blog packages so I doubt it would be penalized by Baidu.

What have your observations been?

chal00d

8:46 am on Nov 7, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



My experience (with charset UTF-8) is that baiduspider typically visits around 20 pages and leaves - compared with page counts numbering in the thousands from Yahoo Slurp, Googlebot and others. So really just trying to eliminate any potential roadblocks right now...

I see the majority of 'popular' Chinese language sites sticking with GB2312 and heard mention it could be favoured, though using UTF-8 doesn't appear to be an issue for the fewer sites I've found using it.

bill

8:55 am on Nov 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Slurp and Googlebot can go overboard at times on some sites, but Baiduspider is the top for my Chinese language UTF-8 sites in terms of activity. In fact, Baiduspider is more than twice as active compared to Yahoo Slurp (#2).

chal00d

9:09 am on Nov 7, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'd love to be seeing that, though right now Baiduspider is #15 in terms of crawling activity, hence the red flag.

Thanks for the comments Bill, if encoding's not the issue I'll continue looking...