Welcome to WebmasterWorld Guest from 54.80.115.140

Forum Moderators: goodroi

Message Too Old, No Replies

Baiduspider - how do I keep it out.

having some difficulty blocking this spider

     
1:41 am on Aug 29, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 30, 2007
posts: 90
votes: 0


I had this entry in my robots.txt file which didn't seem to do the trick

User-agent: baiduspider
Disallow: /

So I added this one and it appeared to keep the bot at bay.

User-agent: baiduspider+
Disallow: /

Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.

It's using this user agent string:

Baiduspider+(+http://www.baidu.com/search/spider.htm)

from ip address 220.181.32.53

My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?

What else can I try?

6:54 pm on Aug 29, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3357
votes: 271


Hi ChicagoFan67,

This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:
[webmasterworld.com...]

1:44 pm on Sept 4, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 30, 2007
posts:90
votes: 0


Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.