Welcome to WebmasterWorld Guest from 188.8.131.52 , register , login , search , subscribe , help , library , PubCon , announcements , recent posts , open posts Pubcon Platinum Sponsor
Baiduspider - how do I keep it out. having some difficulty blocking this spider ChicagoFan67 msg:3734039 1:41 am on Aug 29, 2008 (gmt 0) I had this entry in my robots.txt file which didn't seem to do the trick
So I added this one and it appeared to keep the bot at bay.
Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.
It's using this user agent string:
from ip address 184.108.40.206
My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?
What else can I try?
goodroi msg:3734568 6:54 pm on Aug 29, 2008 (gmt 0)
This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:
[ ...] webmasterworld.com ChicagoFan67 msg:3738245 1:44 pm on Sep 4, 2008 (gmt 0)
Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.