I had this entry in my robots.txt file which didn't seem to do the trick User-agent: baiduspider
Disallow: /
So I added this one and it appeared to keep the bot at bay.
User-agent: baiduspider+
Disallow: /
Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.
It's using this user agent string:
Baiduspider+(+http://www.baidu.com/search/spider.htm)
from ip address 220.181.32.53
My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?
What else can I try?