Welcome to WebmasterWorld Guest from 184.72.145.109

Forum Moderators: goodroi

Message Too Old, No Replies

Baiduspider - how do I keep it out.

having some difficulty blocking this spider

     
1:41 am on Aug 29, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2007
posts: 90
votes: 0


I had this entry in my robots.txt file which didn't seem to do the trick

User-agent: baiduspider
Disallow: /

So I added this one and it appeared to keep the bot at bay.

User-agent: baiduspider+
Disallow: /

Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.

It's using this user agent string:

Baiduspider+(+http://www.baidu.com/search/spider.htm)

from ip address 220.181.32.53

My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?

What else can I try?

6:54 pm on Aug 29, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


Hi ChicagoFan67,

This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:
[webmasterworld.com...]

1:44 pm on Sept 4, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2007
posts:90
votes: 0


Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members