Welcome to WebmasterWorld Guest from 54.226.189.112

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Googlebot Crawling from China IPs

Make sure you allow 203.208.60.1-203.208.60.255

     

incrediBILL

10:16 pm on Jul 25, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Testing Google IPs just got a whole lot more interesting.

The range 203.208.60.1-203.208.60.255 is from China and all those IPs have the proper reverse DNS crawl-203-208-60-nnn.googlebot.com.

Why 203.208.60.0 was skipped, I have no clue.

Here's the whole block allocated to GoogleCN

inetnum: 203.208.32.0 - 203.208.63.255
netname: GOOGLECN
country: CN
admin-c: ZM657-AP
tech-c: ZM657-AP
status: ALLOCATED PORTABLE
mnt-by: MAINT-CNNIC-AP
mnt-lower: MAINT-CNNIC-AP
mnt-routes: MAINT-CNNIC-AP
mnt-irt: IRT-CNNIC-CN
changed: ipas@cnnic.cn 20110412
source: APNIC

route: 203.208.32.0/19
descr: FM China Network
origin: AS24424
notify: nst@corp.ganji.com
mnt-by: MAINT-CNNIC-AP
changed: nst@corp.ganji.com 20060612
source: APNIC

This appears to be legit so do what you must to make sure you're not bouncing Googlebot!

Now I have to punch a hole in my Great Firewall of China which, due to both email and web abuse, is in an actual iptables firewall, not a soft firewall like everything else.

Hat tip to new member Igal Zeifman [webmasterworld.com] for pointing me at this new development.

keyplyr

11:05 pm on Jul 25, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Is the UA exactly the same?

lucy24

12:44 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Is this strictly for Google China-- that is, the SERPs you get if you are in China and go to google dot com? If so, I don't see any particular reason for allowing them in, since the humans viewing the search results will themselves be blocked.

Don't know about your site, but I don't think that an inability to access mine is going to be that last straw that leads someone to emigrate or take dramatic political action ;) ("OK, that does it! If I'm not allowed to read about how to say 'weed whacker' in Berber, I'm moving to Thailand.")

incrediBILL

12:54 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The site being crawled was not a Chinese site. Block what you want, but this appears to be a legit range for Googlebot.com and isn't new either, that's the shocker. Maybe it's only crawling some non-asian sites, not all, no clue.

I'm just adding it to the list of allowed IPs in the firewall and will watch for any further activity.

keyplyr

1:07 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Bill, was the Googlebot UA exactly the same?

incrediBILL

1:51 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



was the Googlebot UA exactly the same?


I think so.

It didn't hit my site directly, working with 3rd party info and it ID'd itself as Googlebot/2.1 is all I know. I asked around about the IPs and it appears to be legit but I don't have any 100% official confirmation.

I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps.

keyplyr

2:03 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Aside: 2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco. It was blocked due to the unverified IP range. It kept coming back requesting 100+ pages, all blocked. A few days later my indexed pages total dropped by a hundred+ at Google WT.

I posted the strange event here at WW but got arguments that it could not have happened, but it did. I have since come to the conclusion that it was obviously a true Googlebot, but either it inadvertently got on this Brazilian range somehow, or the anomaly was at my server/router/switches/etc (which they denied of course.)

Stuff happens that can't be explained sometimes. If you/we never see another occurrence of Googlebot coming from this same Chinese range, then it may be one of those.

incrediBILL

4:23 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco.


What are the IPs, do you still have the info?

What about reverse DNS?

Google swears only legit crawlers have a reverse DNS of crawl-nnn-nnn-nnn-nnn.googlebot.com which will match the forward DNS as well. Anything that doesn't meet that criteria I've been dumping for 6 years with no ill effects.

Sure it wasn't some proxy site because Googlebot can do some wacky things with proxies.

keyplyr

4:57 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



No, I no longer have the logs. As I said, I started a thread about it here at WW and got some very abrupt replies calling BS. Regardless, it was an authentic Googlebot since I lost indexing on the exact pages it was getting 403'd. Took a couple weeks to get those page re-indexed.

Reverse DNS said it was some Brazilian telco, not a Googlebot host. I said all this in the above post.

Anyway, just an example of some screwy behavior that goes against the rules but does in fact happen sometimes.

keyplyr

5:38 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Here's that thread (thanks to wilderness):

[webmasterworld.com...]

g1smd

7:07 am on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps.

Half of Google's datacentres are outside the US.

MxAngel

7:22 am on Jul 26, 2012 (gmt 0)



Although I've got that CIDR range listed for a while as Googlebot, the only hit I've got from that range seems to be a human.

IP: 203.208.61.240

Header Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.54 Safari/536.5

bhukkel

8:59 am on Jul 26, 2012 (gmt 0)



Also the range: 203.208.32.0 - 203.208.63.255

This is also a google CN range..

dstiles

7:42 pm on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I've had that range blocked for about 15 months - it's china, it's a bot, ergo it's blocked.

The only G bots I (reluctantly) allow are from USA. All else G is blocked.

Oh, and they lie about rDNS. At least, for non-crawl bots. I've been seeing a lot of verification bots recently and none of them have appropriate DNS.

lucy24

10:19 pm on Jul 26, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



The site being crawled was not a Chinese site.

Other way around: crawling from China shows you what viewers in China see. The variable is the user's location, not the site's location.

There have been earlier threads about the googlebot being based in the US, making it impossible to tell what your non-US visitors will see. So it shouldn't be surprising if every country has a google range tucked away somewhere, crawling all the same sites as the US-based googlebot.

dstiles

8:16 pm on Jul 27, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Surely we here would have registered the IP ranges by now if that were so?

I have only this one chinese range outside of the US, no other. Had there been valid bot hits from consistent IP ranges I'm sure I would have spotted it by now. Granted they may well be using real browsers from non-G IPs but I've seen no G-bot UA that couldn't be attributed to a scrape attempt or similar from a non-G source.

Having said that, of course, there are known instances of google people outside of the US using US IP ranges for their bots - eg the mocality scandal used 74.125.0.0/16.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month