Baiduspider - how do I keep it out. - Sitemaps, Meta Data, and robots.txt forum at WebmasterWorld

Forum Moderators: goodroi

Message Too Old, No Replies

Baiduspider - how do I keep it out.

having some difficulty blocking this spider

ChicagoFan67

1:41 am on Aug 29, 2008 (gmt 0)

I had this entry in my robots.txt file which didn't seem to do the trick

User-agent: baiduspider
Disallow: /

So I added this one and it appeared to keep the bot at bay.

User-agent: baiduspider+
Disallow: /

Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.

It's using this user agent string:

Baiduspider+(+http://www.baidu.com/search/spider.htm)

from ip address 220.181.32.53

My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?

What else can I try?

goodroi

6:54 pm on Aug 29, 2008 (gmt 0)

Hi ChicagoFan67,

This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:
[webmasterworld.com...]

ChicagoFan67

1:44 pm on Sep 4, 2008 (gmt 0)

Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.