Welcome to WebmasterWorld Guest from 54.234.63.187

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Fake Googlebot from StandardShell

Spoofed UA, no additional headers

     

jdMorgan

5:16 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Bearing in mind all the discussion in the recent Fake Googlebot from SoftLayer [webmasterworld.com] thread, here's another apparently-fake Googlebot (note the almost-correct UA string) from CalPop/CoreExpress/StandardShell

No extra headers were sent. No rDNS is available.

64.69.34.135 - - [09/Nov/2008:03:52:46 -0500] "GET / HTTP/1.1" 403 666 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Jim

keyplyr

8:09 am on Nov 10, 2008 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've had problems with at least one scraper from coreexpress hosting and their range blocked for a few months.

wilderness

1:24 pm on Nov 10, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Here's a CoreExpress from May 2007. (note the improper UA).

64.69.46.zzz - - [07/May/2007:12:40:18 -0500] "GET /MyFolder/MyPage.htm HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)"

This recent activity (i. e., Jim's heads-up) merely assures us of the necessity required for documentatation and some effective plan of action when these pests first appear, which would have prevented this recent activity.

I detest even mentioning colo's, as the mention merely provides them with free advertising.

Don

The Contractor

3:55 pm on Nov 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim

I see the following:
64.69.34.135 was logged 123 times,
starting at 07:03:49 AM on Thursday, November 6, 2008.
The initial browser was Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]

They hit with 0 seconds in between. I'm sure it's safe to block via IP range...yes?

jdMorgan

1:54 am on Nov 12, 2008 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Safe? I don't know, and wouldn't tell anyone what to do anyway...

Since Google encourages Webmasters to check reverse-DNS on Googlebot requests, but there is no rDNS on this IP address, and since this user-agent did not send the usual and customary Googlebot headers, I blocked it without further thought. But that's just me... :)

Jim

dstiles

2:46 am on Nov 12, 2008 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



rDNS checks might help if google maintained them. I keep getting feedfetcher (to three domains at the same time) from an IP that has no rDNS at all, although it's a google IP. The included URL is redirected to a generic googlebot page.

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
IP: 72.14.193.*

Samizdata

3:28 am on Nov 12, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Feedfetcher doesn't claim to be "Googlebot" though.

The devil is in the detail.

...

dstiles

3:28 pm on Nov 12, 2008 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



It implies it by redirecting the URL to the googlebot page. So if it doesn't have an rDNS I suppose google wants us, by its own advocation, to reject its robot.

Samizdata

4:18 pm on Nov 12, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Don't get me wrong, I sympathise entirely. I too would prefer consistency.

The problem seems to lie with various Mountain View robots using IP ranges that are also proxies used by the Wireless Transcoder and Translator. But none are named "Googlebot".

...

wilderness

5:17 pm on Nov 17, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Here I site with "hat in hand" and major portions missing from my humility being ingested.

In the past, I've not been plagued with the FAKES that have been mentioned here in numerous threads. Thus I didn't feel the need to implement a solution of access based on Google's IP's. (currently implemented)

Yesterday in a few short hours I had five US IP's and two RIPE IP's serving up FAKE Google UA's.
The IP ranges continue to grow midly.

In addition, I'm seeing new pokes and probes (from many IP's and UA's, most of which seem to fail under previous denials in place), which I haven't seen in recent months (or longer) that I may only assume are a direct result of these leaked through FAKE Googles.

incrediBILL

10:22 pm on Dec 2, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This one has been pinging me almost daily for about 10-12 page attempts.

I would like to think this is a real spoof and not Google being faked into crawling a proxy with this amount of frequency.

wilderness

11:10 pm on Dec 2, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Bill,
A recent review of the UA's provided something I missed previously?
There's a double trailing blank space after a semi-colon.

incrediBILL

11:30 pm on Dec 2, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Interesting, I didn't notice that.

Full trip DNS checking keeps them out but they just keep trying.

To what end?

wilderness

11:58 pm on Dec 2, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



google bot NOT from Class B keeps them out ;)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month