homepage Welcome to WebmasterWorld Guest from 54.234.2.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Baidu Block
outland88




msg:3667302
 4:08 am on Jun 5, 2008 (gmt 0)

This is probably a dumb/newbie question but do many block Baidu. I've wondered a couple of years about this. Basically every time I see them in my logs the next entry is Googlebot and they both have strangely the same ID number. I sometimes wonder if its the same server for both. Being a US site I don't see much value to them. Same with Shopwiki.

I have a pretty extensive set-up to block spam, many IP blocks for those areas, plus scripts you suggested quite a few years back Incredibill but I've wondered about Baidu's value if any.

 

wilderness




msg:3667562
 1:21 pm on Jun 5, 2008 (gmt 0)

Here's some very OLD
threads
[google.com]

Some more recent threads [google.com]

outland88




msg:3667807
 6:08 pm on Jun 5, 2008 (gmt 0)

I looked at many of those before I posted. Basically some see it like I do for the US. Little value but they don't seem to be doing much wrong on initial inspection.

keyplyr




msg:3670219
 8:13 am on Jun 9, 2008 (gmt 0)

If I've learned anything in my 10 years working on the internet it's that ya never know what will develop down the road. Resources I never felt were important turned into essentials and stable looking giants have fallen.

The Asian market may indeed be crucial for underwriting interests here in the Western regions someday and seeding your presence now may develop into an asset later.

Cyclob




msg:3673690
 5:00 am on Jun 13, 2008 (gmt 0)


System: The following message was spliced on to this thread from: http://www.webmasterworld.com/search_engine_spiders/3673688.htm [webmasterworld.com] by incredibill - 10:00 pm on June 12, 2008 (PST -8)


I have a website which translated into many languages including Traditional and Simplified Chinese.

As we know that Traditional chinese would be widely use in Hong Kong and Simplified is mainly used in China itself.

Baidu search engine is only support Simplified Chinese font character and will automatically translated my Traditional chinese website into Simplified.

This is why Baidu consider my these 2 sites as a Duplicate content which it's actually not. Baidu is now dropping my Simplified site instead of Traditional which I think it should be vice versa.

They also stop crawling Simplified site as well as reduce pages indexed to 4 results!

So since I want my Simplified site to be recognized by Chinese internet user in China which using Baidu as a main search engine, so I'm thinking of blocking Baidu spider from crawling my Traditional site so they would turn into the Simplified instead.

My question is how to block this Baidu spider since I've heard somewhere that they don't obey the robots.txt command.

Any suggestion.... please help.

keyplyr




msg:3673847
 9:10 am on Jun 13, 2008 (gmt 0)

If your site is on an Apache server, you can block using mod_rewrite via .htaccess:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC]
RewriteRule .* - [F]

I included the [NC] which allows for case differences since at least one of the Baidu bots uses "BaiDuSpider"

Other variants are:

BaiduImagespider+(+http://www.baidu.jp/search/s308.html)

Baiduspider+(+http://help.baidu.jp/system/05.html)

Baiduspider+(+http://www.baidu.com/search/spider.htm)

Baiduspider+(+http://www.baidu.com/search/spider_jp.html)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved