homepage Welcome to WebmasterWorld Guest from 54.227.62.141
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Googlebot Crawling from China IPs
Make sure you allow 203.208.60.1-203.208.60.255
incrediBILL




msg:4478892
 10:16 pm on Jul 25, 2012 (gmt 0)

Testing Google IPs just got a whole lot more interesting.

The range 203.208.60.1-203.208.60.255 is from China and all those IPs have the proper reverse DNS crawl-203-208-60-nnn.googlebot.com.

Why 203.208.60.0 was skipped, I have no clue.

Here's the whole block allocated to GoogleCN

inetnum: 203.208.32.0 - 203.208.63.255
netname: GOOGLECN
country: CN
admin-c: ZM657-AP
tech-c: ZM657-AP
status: ALLOCATED PORTABLE
mnt-by: MAINT-CNNIC-AP
mnt-lower: MAINT-CNNIC-AP
mnt-routes: MAINT-CNNIC-AP
mnt-irt: IRT-CNNIC-CN
changed: ipas@cnnic.cn 20110412
source: APNIC

route: 203.208.32.0/19
descr: FM China Network
origin: AS24424
notify: nst@corp.ganji.com
mnt-by: MAINT-CNNIC-AP
changed: nst@corp.ganji.com 20060612
source: APNIC

This appears to be legit so do what you must to make sure you're not bouncing Googlebot!

Now I have to punch a hole in my Great Firewall of China which, due to both email and web abuse, is in an actual iptables firewall, not a soft firewall like everything else.

Hat tip to new member Igal Zeifman [webmasterworld.com] for pointing me at this new development.

 

keyplyr




msg:4478899
 11:05 pm on Jul 25, 2012 (gmt 0)

Is the UA exactly the same?

lucy24




msg:4478907
 12:44 am on Jul 26, 2012 (gmt 0)

Is this strictly for Google China-- that is, the SERPs you get if you are in China and go to google dot com? If so, I don't see any particular reason for allowing them in, since the humans viewing the search results will themselves be blocked.

Don't know about your site, but I don't think that an inability to access mine is going to be that last straw that leads someone to emigrate or take dramatic political action ;) ("OK, that does it! If I'm not allowed to read about how to say 'weed whacker' in Berber, I'm moving to Thailand.")

incrediBILL




msg:4478909
 12:54 am on Jul 26, 2012 (gmt 0)

The site being crawled was not a Chinese site. Block what you want, but this appears to be a legit range for Googlebot.com and isn't new either, that's the shocker. Maybe it's only crawling some non-asian sites, not all, no clue.

I'm just adding it to the list of allowed IPs in the firewall and will watch for any further activity.

keyplyr




msg:4478910
 1:07 am on Jul 26, 2012 (gmt 0)


Bill, was the Googlebot UA exactly the same?

incrediBILL




msg:4478913
 1:51 am on Jul 26, 2012 (gmt 0)

was the Googlebot UA exactly the same?


I think so.

It didn't hit my site directly, working with 3rd party info and it ID'd itself as Googlebot/2.1 is all I know. I asked around about the IPs and it appears to be legit but I don't have any 100% official confirmation.

I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps.

keyplyr




msg:4478915
 2:03 am on Jul 26, 2012 (gmt 0)



Aside: 2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco. It was blocked due to the unverified IP range. It kept coming back requesting 100+ pages, all blocked. A few days later my indexed pages total dropped by a hundred+ at Google WT.

I posted the strange event here at WW but got arguments that it could not have happened, but it did. I have since come to the conclusion that it was obviously a true Googlebot, but either it inadvertently got on this Brazilian range somehow, or the anomaly was at my server/router/switches/etc (which they denied of course.)

Stuff happens that can't be explained sometimes. If you/we never see another occurrence of Googlebot coming from this same Chinese range, then it may be one of those.

incrediBILL




msg:4478953
 4:23 am on Jul 26, 2012 (gmt 0)

2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco.


What are the IPs, do you still have the info?

What about reverse DNS?

Google swears only legit crawlers have a reverse DNS of crawl-nnn-nnn-nnn-nnn.googlebot.com which will match the forward DNS as well. Anything that doesn't meet that criteria I've been dumping for 6 years with no ill effects.

Sure it wasn't some proxy site because Googlebot can do some wacky things with proxies.

keyplyr




msg:4478963
 4:57 am on Jul 26, 2012 (gmt 0)

No, I no longer have the logs. As I said, I started a thread about it here at WW and got some very abrupt replies calling BS. Regardless, it was an authentic Googlebot since I lost indexing on the exact pages it was getting 403'd. Took a couple weeks to get those page re-indexed.

Reverse DNS said it was some Brazilian telco, not a Googlebot host. I said all this in the above post.

Anyway, just an example of some screwy behavior that goes against the rules but does in fact happen sometimes.

keyplyr




msg:4478975
 5:38 am on Jul 26, 2012 (gmt 0)


Here's that thread (thanks to wilderness):

[webmasterworld.com...]

g1smd




msg:4478989
 7:07 am on Jul 26, 2012 (gmt 0)

I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps.

Half of Google's datacentres are outside the US.

MxAngel




msg:4478994
 7:22 am on Jul 26, 2012 (gmt 0)

Although I've got that CIDR range listed for a while as Googlebot, the only hit I've got from that range seems to be a human.

IP: 203.208.61.240

Header Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.54 Safari/536.5

bhukkel




msg:4479007
 8:59 am on Jul 26, 2012 (gmt 0)

Also the range: 203.208.32.0 - 203.208.63.255

This is also a google CN range..

dstiles




msg:4479201
 7:42 pm on Jul 26, 2012 (gmt 0)

I've had that range blocked for about 15 months - it's china, it's a bot, ergo it's blocked.

The only G bots I (reluctantly) allow are from USA. All else G is blocked.

Oh, and they lie about rDNS. At least, for non-crawl bots. I've been seeing a lot of verification bots recently and none of them have appropriate DNS.

lucy24




msg:4479251
 10:19 pm on Jul 26, 2012 (gmt 0)

The site being crawled was not a Chinese site.

Other way around: crawling from China shows you what viewers in China see. The variable is the user's location, not the site's location.

There have been earlier threads about the googlebot being based in the US, making it impossible to tell what your non-US visitors will see. So it shouldn't be surprising if every country has a google range tucked away somewhere, crawling all the same sites as the US-based googlebot.

dstiles




msg:4479565
 8:16 pm on Jul 27, 2012 (gmt 0)

Surely we here would have registered the IP ranges by now if that were so?

I have only this one chinese range outside of the US, no other. Had there been valid bot hits from consistent IP ranges I'm sure I would have spotted it by now. Granted they may well be using real browsers from non-G IPs but I've seen no G-bot UA that couldn't be attributed to a scrape attempt or similar from a non-G source.

Having said that, of course, there are known instances of google people outside of the US using US IP ranges for their bots - eg the mocality scandal used 74.125.0.0/16.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved