homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Baiduspider - how do I keep it out.
having some difficulty blocking this spider

 1:41 am on Aug 29, 2008 (gmt 0)

I had this entry in my robots.txt file which didn't seem to do the trick

User-agent: baiduspider
Disallow: /

So I added this one and it appeared to keep the bot at bay.

User-agent: baiduspider+
Disallow: /

Looking at my logs this morning, I found the spider back again trying to request each of the 50,000+ pages I have.

It's using this user agent string:


from ip address

My robots.txt file has become very long - (20kB) and the entry is toward the bottom. Would this cause it to be skipped?

What else can I try?



 6:54 pm on Aug 29, 2008 (gmt 0)

Hi ChicagoFan67,

This issue was discussed in 2005. Sorry to hear the issue is still around. Good news is that the solution suggested then will still work - use htaccess or isapi rewrite to deny it. Check out the old discussion:


 1:44 pm on Sep 4, 2008 (gmt 0)

Thanyou for your reply goodroi. I haven't been hit as hard as I first thought. It looks like Baidu is spidering links that it has found outside of my website.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved