Fake Googlebot from StandardShell

Forum Moderators: open

Message Too Old, No Replies

Fake Googlebot from StandardShell

Spoofed UA, no additional headers

jdMorgan

5:16 pm on Nov 9, 2008 (gmt 0)

Bearing in mind all the discussion in the recent Fake Googlebot from SoftLayer [webmasterworld.com] thread, here's another apparently-fake Googlebot (note the almost-correct UA string) from CalPop/CoreExpress/StandardShell

No extra headers were sent. No rDNS is available.

64.69.34.135 - - [09/Nov/2008:03:52:46 -0500] "GET / HTTP/1.1" 403 666 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Jim

keyplyr

8:09 am on Nov 10, 2008 (gmt 0)

I've had problems with at least one scraper from coreexpress hosting and their range blocked for a few months.

wilderness

1:24 pm on Nov 10, 2008 (gmt 0)

Here's a CoreExpress from May 2007. (note the improper UA).

64.69.46.zzz - - [07/May/2007:12:40:18 -0500] "GET /MyFolder/MyPage.htm HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)"

This recent activity (i. e., Jim's heads-up) merely assures us of the necessity required for documentatation and some effective plan of action when these pests first appear, which would have prevented this recent activity.

I detest even mentioning colo's, as the mention merely provides them with free advertising.

Don

The Contractor

3:55 pm on Nov 10, 2008 (gmt 0)

Jim

I see the following:
64.69.34.135 was logged 123 times,
starting at 07:03:49 AM on Thursday, November 6, 2008.
The initial browser was Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]

They hit with 0 seconds in between. I'm sure it's safe to block via IP range...yes?

jdMorgan

1:54 am on Nov 12, 2008 (gmt 0)

Safe? I don't know, and wouldn't tell anyone what to do anyway...

Since Google encourages Webmasters to check reverse-DNS on Googlebot requests, but there is no rDNS on this IP address, and since this user-agent did not send the usual and customary Googlebot headers, I blocked it without further thought. But that's just me... :)

Jim

dstiles

2:46 am on Nov 12, 2008 (gmt 0)

rDNS checks might help if google maintained them. I keep getting feedfetcher (to three domains at the same time) from an IP that has no rDNS at all, although it's a google IP. The included URL is redirected to a generic googlebot page.

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
IP: 72.14.193.*

Samizdata

3:28 am on Nov 12, 2008 (gmt 0)

Feedfetcher doesn't claim to be "Googlebot" though.

The devil is in the detail.

...

dstiles

3:28 pm on Nov 12, 2008 (gmt 0)

It implies it by redirecting the URL to the googlebot page. So if it doesn't have an rDNS I suppose google wants us, by its own advocation, to reject its robot.

Samizdata

4:18 pm on Nov 12, 2008 (gmt 0)

Don't get me wrong, I sympathise entirely. I too would prefer consistency.

The problem seems to lie with various Mountain View robots using IP ranges that are also proxies used by the Wireless Transcoder and Translator. But none are named "Googlebot".

...

wilderness

5:17 pm on Nov 17, 2008 (gmt 0)

Here I site with "hat in hand" and major portions missing from my humility being ingested.

In the past, I've not been plagued with the FAKES that have been mentioned here in numerous threads. Thus I didn't feel the need to implement a solution of access based on Google's IP's. (currently implemented)

Yesterday in a few short hours I had five US IP's and two RIPE IP's serving up FAKE Google UA's.
The IP ranges continue to grow midly.

In addition, I'm seeing new pokes and probes (from many IP's and UA's, most of which seem to fail under previous denials in place), which I haven't seen in recent months (or longer) that I may only assume are a direct result of these leaked through FAKE Googles.

incrediBILL

10:22 pm on Dec 2, 2008 (gmt 0)

This one has been pinging me almost daily for about 10-12 page attempts.

I would like to think this is a real spoof and not Google being faked into crawling a proxy with this amount of frequency.

wilderness

11:10 pm on Dec 2, 2008 (gmt 0)

Bill,
A recent review of the UA's provided something I missed previously?
There's a double trailing blank space after a semi-colon.

incrediBILL

11:30 pm on Dec 2, 2008 (gmt 0)

Interesting, I didn't notice that.

Full trip DNS checking keeps them out but they just keep trying.

To what end?

wilderness

11:58 pm on Dec 2, 2008 (gmt 0)

google bot NOT from Class B keeps them out ;)