Forum Moderators: open

Message Too Old, No Replies

Baiduspider Crawl Ranges

verified IP ranges

         

keyplyr

10:23 pm on Dec 20, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




UA: Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

While this UA may be observed coming from various IP ranges assigned to ChinaNet, China Unicom, China Telcom or CNCGroup, only the ranges explicitly assigned to Baidu (some as crawl, some just registered to Baidu) should be accepted as authentic:

63.243.252.0/24
63.243.252.0 - 63.243.252.255

103.6.76.0/22
103.6.76.0 - 103.6.79.255

104.193.88.0/22
104.193.88.0 - 104.193.91.255

119.63.192.0/21
119.63.192.0 - 119.63.199.255

123.112.0.0/12
123.112.0.0 - 123.127.255.255

180.76.0.0/16
180.76.0.0 - 180.76.255.255

185.10.104.0/22
185.10.104.0 - 185.10.107.255

lucy24

5:50 am on Dec 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The mention of Baidu reminds me that I just recently discovered something I'd never suspected: the Anglophone Baiduspider ("en-US" instead of "zh-cn, zh-tw"). Far as I know, they always come from 180.76.15, and they never request anything but the root (various sites).

I get fake Baiduspiders in utterly hilarious quantities. Most of the time they're stunningly inept; I don't even notice them unless I happen to be looking into logs or headers for some other reason.

Isn't 220.181.blahblah legitimate Baidu? I always thought it was. Ask for robots.txt and everything. (Don't obey it, of course--unless they're simply too stupid to understand block listings--but they do ask.)

keyplyr

6:00 am on Dec 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Isn't 220.181.blahblah legitimate Baidu?
Nope, registered to ChineNet, hense the reason for this thread. However, there may be valid Baidu ranges I haven't listed. If so, please contribute.

keyplyr

11:24 am on Dec 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So the way I see it, there are two likely explanations for this UA coming from ranges *not* assigned to Baidu:

• The agent is authentic but for some reason it is either not officially registered as using the sub-net of ranges at the parent or the registration is not propagating across whois databases.
- or -
• The agent is being spoofed by any number of imposters and having sucess at doing so, has continued this behavior for several years.

Either way, too risky to allow IMO.

dstiles

7:33 pm on Dec 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I allow baiduspider. The ranges I have listed for it (allowed as a spider!) are...

61.135.168.0 - 61.135.169.255
119.63.196.1 - 119.63.196.127
119.63.198.0 - 119.63.198.255
123.125.66.0 - 123.125.66.255
123.125.71.0 - 123.125.71.255
180.76.4.0 - 180.76.6.255
180.76.15.0 - 180.76.15.255
185.10.104.128 - 185.10.104.199
220.181.108.0 - 220.181.108.255

Some of those ranges are entire baidu from which I allow baidu UAs.

My baidu ranges (ie non-spider or partly-spider-inclusive) are...

63.243.252.0 - 63.243.255.255
103.6.76.0 - 103.6.79.255
104.193.88.0 - 104.193.91.255
119.63.192.0 - 119.63.199.255
180.76.0.0 - 180.76.255.255
185.10.104.0 - 185.10.107.255

keyplr - I'm not sure what you are implying with 123.112.0.0/12. That is far too indistinct. Is there a more specific range within it?

keyplyr

9:58 pm on Dec 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@dstiles - good catch. I shouldn't have listed that entire /12 as verified Baidu in the above list. I have it all allowed to use the Baiduspider UA because there are several sub-nets within that China Unicom /12 registered to Baidu, and I probably thought it succinct to do so.

Some of the ranges you allow the Baiduspider UA access from are not registered to Baidu. Baiduspider requests from those ranges *may* or *may not* be from Baidu. They may be from imposters.

lucy24

11:35 pm on Dec 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I attacked raw logs with a sledgehammer that looked like this:
^(?!220\.181|180\.76|185\.10\.|123\.12\d|119\.63).+Mozilla/5.0 \(compatible; Baiduspider
That's the beginning of the real Baiduspider UA
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
not to be confused with the wildly popular spoofer that calls itself
compatible;Baiduspider/2.0; +http://www.baidu.com/search/spider.html
[sic]

The Baiduspider UA from non-Baidu ranges requested robots.txt (only) from
111.13.102 (spanning March-June of this year)
125.39.78 (November 2011)
The 111.13 IP yields assorted other robots and spoofers. I haven't seen 125.39 anywhere else; it seems to have gone out of business in 2011.

I think this means that various Chinese robots put on a Baiduspider mask in order to ask for robots.txt without arousing suspicion, in hopes that this would yield information about what files to ask for. In my case they must not have found anything tempting, presumably along the lines of
Disallow: /directory-specific-to-some-major-CMS

keyplyr

5:15 am on Dec 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RE: 123.112.0.0 - 123.127.255.255

The only Baidu range I can *now* find in this /12 is: 123.125.71.0 - 123.125.71.255

Registrations change, which is why it is important to validate ranges periodically.

[edited by: keyplyr at 11:59 pm (utc) on Dec 23, 2016]

dstiles

8:56 pm on Dec 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplr...
> Some of the ranges you allow the Baiduspider UA access from are not registered to Baidu

Just re-checked, and they all have baidu DNS ranges (eg 185.10.104.0 - 185.10.107.255) or DNS records (eg baiduspider-220...crawl.baidu...). There may be more but those are my current lists.

This derived using linux Network Tools whois and lookup tabs.

keyplyr

11:05 pm on Dec 23, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well these have no mention of Baidu in the registration:
61.135.168.0 - 61.135.169.255 registered to China Unicom
123.125.66.0 - 123.125.66.255 registered to China Unicom Beijing
220.181.108.0 - 220.181.108.255 registered to Chinanet (China Telecom)

However, there has always been discrepancies between whois look-up tools. I think some don't update their database often enough. I won't allow any range that isn't registered explicitly to Baidu, whether Baiduspider poaches the range or not. China being what it is, I don't trust it.

dstiles

7:23 pm on Dec 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Check the individual DNS IP records, keyplr. Many IP ranges to not define the actual user. Eg google shows googlebot for many IPs but not for the actual IP range allocation.

As I said, if you are using linux-ish then run Network Tools. Or...

dig -x 123.125.66.30

...should do it - I think dig works on windows and mac.

Results for that dig...

;; QUESTION SECTION:
;30.66.125.123.in-addr.arpa.INPTR

;; ANSWER SECTION:
30.66.125.123.in-addr.arpa. 7162 INPTRbaiduspider-123-125-66-30.crawl.baidu.com.

;; AUTHORITY SECTION:
66.125.123.in-addr.arpa. 5037INNSns3.crawl.baidu.com.
66.125.123.in-addr.arpa. 5037INNSns2.crawl.baidu.com.
66.125.123.in-addr.arpa. 5037INNSns1.crawl.baidu.com.
66.125.123.in-addr.arpa. 5037INNSns4.crawl.baidu.com.

keyplyr

8:40 pm on Dec 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for explaining that dstiles, however, as I said, until Baidu is registered in the standard whois, I'm not allowing the range.

keyplyr

2:32 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's an example of an imposter:
188.129.143.** - - [24/Dec/2016:16:32:36 -0800] "GET / HTTP/1.1" 403 4987 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
Same UA as authentic Baiduspider.

Host: magtinet.ge
Parent: Caucasus Online (ISP) Georgia
188.129.128.0 - 188.129.159.255
188.129.128.0/19

lucy24

2:44 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Parent: Caucasus Online (ISP) Georgia
Did you happen to notice what, if any, language it claimed to speak?

keyplyr

4:07 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. I didn't sorry. Not grabing headers on this site (client's) but I did notice a slight accent.

keyplyr

2:13 am on Dec 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



another imposter:
59.49.253.** - - [30/Dec/2016:11:12:03 -0800] "HEAD / HTTP/1.1" 403 3412 "http://www.example.cn/" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html\xa3\xa9"
Host: Hainan-telecom, CN (broadband)
59.49.252.0 - 59.49.255.255
59.49.252.0/22