Forum Moderators: open
Right now we're getting penalized by Baidu for dupe content. I was thinking about blocking Baidu from crawling one of the site's, most likely .hk, in the hope that this will boost our rankings for the .cn site, which is obviously the larger market. However, Yahoo has always been sending decent traffic to the .hk domain, so if there was any danger of also blocking Yahoo I'll need a serious re-think on this strategy. On top of which I've heard Baidu doesn't necessarily pay any attention to 'ignore' requests anyway.
I'd welcome any previous experiences with the Chinese simplified vs traditional issue as I'm really stuck on this one having spent a lot of time and effort getting the sites set up and translated in the first place.
Has anyone been here before?
I've got similar content sites in simplified and traditional, but they're not close enough to be called dupes. That's not going to help you much I'm afraid. Yahoo and Google have been very good ranking the sites for their respective content. Baidu has always favored the simplified site, but that's the older more established site for me anyway.
However, since the original post it now transpires that we don't know how to block Baidu for one of these sites even if we wanted to.
A more pertinent question might actually be: is there an equivalent of the robots.txt exclusion standard that Baidu recognizes?
If baiduspider is still not checking or obeying robots.txt as has been reported [webmasterworld.com], then you might want to use .htaccess to ban them.
I'll report back the results in a few weeks. In the meantime if anyone knows of a "Webmaster Tools" type area within Baidu I'd be keen to hear about it. It would hopefully give us a little more insight into what is going on.
Cheers
I've had these sites for several years though (4+) so maybe that might have helped it?
I also wonder if the encoding might affect it? I don't know if I'm completely off base here but I use gb2312 and big5 rather than just utf8 completely.
If you do UTF-8 right then in most cases there won't be an issue for the spiders or the majority of your viewers. In a market like China there are still the occasional visitors using very ancient software though. For some that is an issue they would prefer to avoid. Using GB2312 or Big5 charsets is a safe bet.