keyplyr

msg:4475786 | 5:03 am on Jul 15, 2012 (gmt 0) |
When I *did* allow it, the only authentic ranges I found were: 119\.63\.19[2-9]\. 180\.76\. 123\.125\.71\. 220\.181\.
|
dstiles

msg:4475903 | 8:14 pm on Jul 15, 2012 (gmt 0) |
I only permit JP Baidu - although I suspect the results are shared across all baidu-operating countries. Reason behind this is: I need traffic from JP but do not particularly want it from CN. Really 204.45.133.74? That's FDC Server in USA. I have "company" ranges listed as: 119.63.192.0 - 119.63.199.255 (JP) 180.76.0.0 - 180.76.255.255 (CN) Permitted bots (I may be behind on these lists): China: 61.135.169.32 - 61.135.169.32 61.135.190.1 - 61.135.190.254 123.125.66.0 - 123.125.66.255 123.125.71.0 - 123.125.71.255 180.76.5.0 - 180.76.6.255 220.181.7.0 - 220.181.7.255 220.181.108.0 - 220.181.108.255 Japan: 119.63.192.128 - 119.63.192.254 119.63.193.0 - 119.63.193.255 119.63.196.1 - 119.63.196.127 119.63.198.0 - 119.63.198.255 119.63.199.103 - 119.63.199.103
|
tangor

msg:4476059 | 2:00 pm on Jul 16, 2012 (gmt 0) |
I deny the Chinese version, otherwise, Baidu has (so far) honored robots.txt, taking that, (some ips more aggressive than others) but that's all they take.
|
rowan194

msg:4485015 | 12:38 pm on Aug 15, 2012 (gmt 0) |
I've had a lot of problems with Baidu, so much so that I wrote a script that firewalls any c class that loads with a Baidu user-agent. Not a great long term solution, as anyone knowing this could perform a simple DoS - load a single page with a faked Baidu referer and the 256 IPs around you are quickly blocked - but I'd had it with them hitting my sites. It's the only time I've had to firewall a major crawler, rather than just blocking it with robots.txt (which doesn't seem to work.) An interesting side effect is that crawlers purporting to be Baiduspider get blocked too. :)
|
wilderness

msg:4485054 | 2:21 pm on Aug 15, 2012 (gmt 0) |
| rather than just blocking it with robots.txt (which doesn't seem to work.) |
| FWIW, robots.txt doesn't block or deny anything, rather, robots.txt is a request to compliant bots to honor your wishes. htaccess on the other hand, is fully capable of denying access to a variety of visitors, and utilizing a variety of methods and/or criteria.
|
rowan194

msg:4485077 | 3:28 pm on Aug 15, 2012 (gmt 0) |
I understand that robots.txt doesn't block a site. What I meant was that Baidu don't seem to respect my requests in that file for them to not go anywhere on my site. Thus, I moved towards blocking at the IP level.
|
Igal Zeifman

msg:4485388 | 9:25 am on Aug 16, 2012 (gmt 0) |
Hi, You can verify Baidu Spider IPs by using "Check IP" function in Botopedia.org. It will also provide you with all legit, user-agent data for this and other bots.
|
dstiles

msg:4485584 | 6:58 pm on Aug 16, 2012 (gmt 0) |
Thanks. Looks to be a useful site if it really has all IPs for any given bot.
|
not2easy

msg:4535715 | 5:55 pm on Jan 13, 2013 (gmt 0) |
I ended up here looking for info on the 180.76.5.nnn range because of about two dozen requests for "robots.txt" from UA: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 The same UA comes in from 202.46.53.82 and 202.46.62.95 (both 403s) I have not seen Baidu anywhere for a long time, but this UA ONLY requests robots.txt so maybe it is part of a tag team.(?)
|
blend27

msg:4535732 | 7:10 pm on Jan 13, 2013 (gmt 0) |
off topic. sorry. | You can verify Baidu Spider IPs by using "Check IP" function in *otopedia.org. |
| Ever heard of 404, well, This is it.? Great Idea, but get you contact form fixed, first. This should not be a place to promote affiliates.
|
blend27

msg:4535733 | 7:21 pm on Jan 13, 2013 (gmt 0) |
Same as dstiles. Japan OK, CN blocked. Have a client that does over 60% of her retail business true JP, lots of traffic from there.
|
|