Welcome to WebmasterWorld Guest from 54.227.101.214

Forum Moderators: goodroi

Message Too Old, No Replies

Baiduspider - how do I keep it out.

having some difficulty blocking this spider

     

ChicagoFan67

1:41 am on Aug 29, 2008 (gmt 0)

5+ Year Member



I had this entry in my robots.txt file which didn't seem to do the trick

User-agent: baiduspider
Disallow: /

So I added this one and it appeared to keep the bot at bay.

User-agent: baiduspider+
Disallow: /

Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.

It's using this user agent string:

Baiduspider+(+http://www.baidu.com/search/spider.htm)

from ip address 220.181.32.53

My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?

What else can I try?

goodroi

6:54 pm on Aug 29, 2008 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hi ChicagoFan67,

This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:
[webmasterworld.com...]

ChicagoFan67

1:44 pm on Sep 4, 2008 (gmt 0)

5+ Year Member



Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month