When I *did* allow it, the only authentic ranges I found were:
I only permit JP Baidu - although I suspect the results are shared across all baidu-operating countries. Reason behind this is: I need traffic from JP but do not particularly want it from CN.
Really 126.96.36.199? That's FDC Server in USA.
I have "company" ranges listed as:
188.8.131.52 - 184.108.40.206 (JP)
220.127.116.11 - 18.104.22.168 (CN)
Permitted bots (I may be behind on these lists):
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
I deny the Chinese version, otherwise, Baidu has (so far) honored robots.txt, taking that, (some ips more aggressive than others) but that's all they take.
I've had a lot of problems with Baidu, so much so that I wrote a script that firewalls any c class that loads with a Baidu user-agent. Not a great long term solution, as anyone knowing this could perform a simple DoS - load a single page with a faked Baidu referer and the 256 IPs around you are quickly blocked - but I'd had it with them hitting my sites. It's the only time I've had to firewall a major crawler, rather than just blocking it with robots.txt (which doesn't seem to work.)
An interesting side effect is that crawlers purporting to be Baiduspider get blocked too. :)
|rather than just blocking it with robots.txt (which doesn't seem to work.) |
FWIW, robots.txt doesn't block or deny anything, rather, robots.txt is a request to compliant bots to honor your wishes.
htaccess on the other hand, is fully capable of denying access to a variety of visitors, and utilizing a variety of methods and/or criteria.
I understand that robots.txt doesn't block a site. What I meant was that Baidu don't seem to respect my requests in that file for them to not go anywhere on my site. Thus, I moved towards blocking at the IP level.
You can verify Baidu Spider IPs by using "Check IP" function in Botopedia.org.
It will also provide you with all legit, user-agent data for this and other bots.
Thanks. Looks to be a useful site if it really has all IPs for any given bot.
I ended up here looking for info on the 180.76.5.nnn range because of about two dozen requests for "robots.txt" from UA: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
The same UA comes in from 22.214.171.124 and 126.96.36.199 (both 403s)
I have not seen Baidu anywhere for a long time, but this UA ONLY requests robots.txt so maybe it is part of a tag team.(?)
off topic. sorry.
|You can verify Baidu Spider IPs by using "Check IP" function in *otopedia.org. |
Ever heard of 404, well, This is it.?
Great Idea, but get you contact form fixed, first. This should not be a place to promote affiliates.
Same as dstiles. Japan OK, CN blocked. Have a client that does over 60% of her retail business true JP, lots of traffic from there.