| 11:05 pm on Jul 25, 2012 (gmt 0)|
Is the UA exactly the same?
| 12:44 am on Jul 26, 2012 (gmt 0)|
Is this strictly for Google China-- that is, the SERPs you get if you are in China and go to google dot com? If so, I don't see any particular reason for allowing them in, since the humans viewing the search results will themselves be blocked.
Don't know about your site, but I don't think that an inability to access mine is going to be that last straw that leads someone to emigrate or take dramatic political action ;) ("OK, that does it! If I'm not allowed to read about how to say 'weed whacker' in Berber, I'm moving to Thailand.")
| 12:54 am on Jul 26, 2012 (gmt 0)|
The site being crawled was not a Chinese site. Block what you want, but this appears to be a legit range for Googlebot.com and isn't new either, that's the shocker. Maybe it's only crawling some non-asian sites, not all, no clue.
I'm just adding it to the list of allowed IPs in the firewall and will watch for any further activity.
| 1:07 am on Jul 26, 2012 (gmt 0)|
Bill, was the Googlebot UA exactly the same?
| 1:51 am on Jul 26, 2012 (gmt 0)|
|was the Googlebot UA exactly the same? |
I think so.
It didn't hit my site directly, working with 3rd party info and it ID'd itself as Googlebot/2.1 is all I know. I asked around about the IPs and it appears to be legit but I don't have any 100% official confirmation.
I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps.
| 2:03 am on Jul 26, 2012 (gmt 0)|
Aside: 2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco. It was blocked due to the unverified IP range. It kept coming back requesting 100+ pages, all blocked. A few days later my indexed pages total dropped by a hundred+ at Google WT.
I posted the strange event here at WW but got arguments that it could not have happened, but it did. I have since come to the conclusion that it was obviously a true Googlebot, but either it inadvertently got on this Brazilian range somehow, or the anomaly was at my server/router/switches/etc (which they denied of course.)
Stuff happens that can't be explained sometimes. If you/we never see another occurrence of Googlebot coming from this same Chinese range, then it may be one of those.
| 4:23 am on Jul 26, 2012 (gmt 0)|
|2 years ago I saw a Googlebot come from a range assigned to a Brazilian teleco. |
What are the IPs, do you still have the info?
What about reverse DNS?
Google swears only legit crawlers have a reverse DNS of crawl-nnn-nnn-nnn-nnn.googlebot.com which will match the forward DNS as well. Anything that doesn't meet that criteria I've been dumping for 6 years with no ill effects.
Sure it wasn't some proxy site because Googlebot can do some wacky things with proxies.
| 4:57 am on Jul 26, 2012 (gmt 0)|
No, I no longer have the logs. As I said, I started a thread about it here at WW and got some very abrupt replies calling BS. Regardless, it was an authentic Googlebot since I lost indexing on the exact pages it was getting 403'd. Took a couple weeks to get those page re-indexed.
Reverse DNS said it was some Brazilian telco, not a Googlebot host. I said all this in the above post.
Anyway, just an example of some screwy behavior that goes against the rules but does in fact happen sometimes.
| 5:38 am on Jul 26, 2012 (gmt 0)|
Here's that thread (thanks to wilderness):
| 7:07 am on Jul 26, 2012 (gmt 0)|
|I get some feeling like there's some big secret we've not been let in on yet, like a data center is being relocated outside the US for cost reduction perhaps. |
Half of Google's datacentres are outside the US.
| 7:22 am on Jul 26, 2012 (gmt 0)|
Although I've got that CIDR range listed for a while as Googlebot, the only hit I've got from that range seems to be a human.
Header Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.54 Safari/536.5
| 8:59 am on Jul 26, 2012 (gmt 0)|
Also the range: 18.104.22.168 - 22.214.171.124
This is also a google CN range..
| 7:42 pm on Jul 26, 2012 (gmt 0)|
I've had that range blocked for about 15 months - it's china, it's a bot, ergo it's blocked.
The only G bots I (reluctantly) allow are from USA. All else G is blocked.
Oh, and they lie about rDNS. At least, for non-crawl bots. I've been seeing a lot of verification bots recently and none of them have appropriate DNS.
| 10:19 pm on Jul 26, 2012 (gmt 0)|
|The site being crawled was not a Chinese site. |
Other way around: crawling from China shows you what viewers in China see. The variable is the user's location, not the site's location.
There have been earlier threads about the googlebot being based in the US, making it impossible to tell what your non-US visitors will see. So it shouldn't be surprising if every country has a google range tucked away somewhere, crawling all the same sites as the US-based googlebot.
| 8:16 pm on Jul 27, 2012 (gmt 0)|
Surely we here would have registered the IP ranges by now if that were so?
I have only this one chinese range outside of the US, no other. Had there been valid bot hits from consistent IP ranges I'm sure I would have spotted it by now. Granted they may well be using real browsers from non-G IPs but I've seen no G-bot UA that couldn't be attributed to a scrape attempt or similar from a non-G source.
Having said that, of course, there are known instances of google people outside of the US using US IP ranges for their bots - eg the mocality scandal used 126.96.36.199/16.