Welcome to WebmasterWorld Guest from 54.147.63.124

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Fake Googlebot from StandardShell

Spoofed UA, no additional headers

     
5:16 pm on Nov 9, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Bearing in mind all the discussion in the recent Fake Googlebot from SoftLayer [webmasterworld.com] thread, here's another apparently-fake Googlebot (note the almost-correct UA string) from CalPop/CoreExpress/StandardShell

No extra headers were sent. No rDNS is available.

64.69.34.135 - - [09/Nov/2008:03:52:46 -0500] "GET / HTTP/1.1" 403 666 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Jim

8:09 am on Nov 10, 2008 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5811
votes: 64


I've had problems with at least one scraper from coreexpress hosting and their range blocked for a few months.
1:24 pm on Nov 10, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Here's a CoreExpress from May 2007. (note the improper UA).

64.69.46.zzz - - [07/May/2007:12:40:18 -0500] "GET /MyFolder/MyPage.htm HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)"

This recent activity (i. e., Jim's heads-up) merely assures us of the necessity required for documentatation and some effective plan of action when these pests first appear, which would have prevented this recent activity.

I detest even mentioning colo's, as the mention merely provides them with free advertising.

Don

3:55 pm on Nov 10, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 29, 2002
posts:1954
votes: 0


Jim

I see the following:
64.69.34.135 was logged 123 times,
starting at 07:03:49 AM on Thursday, November 6, 2008.
The initial browser was Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]

They hit with 0 seconds in between. I'm sure it's safe to block via IP range...yes?

1:54 am on Nov 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Safe? I don't know, and wouldn't tell anyone what to do anyway...

Since Google encourages Webmasters to check reverse-DNS on Googlebot requests, but there is no rDNS on this IP address, and since this user-agent did not send the usual and customary Googlebot headers, I blocked it without further thought. But that's just me... :)

Jim

2:46 am on Nov 12, 2008 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3092
votes: 2


rDNS checks might help if google maintained them. I keep getting feedfetcher (to three domains at the same time) from an IP that has no rDNS at all, although it's a google IP. The included URL is redirected to a generic googlebot page.

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
IP: 72.14.193.*

3:28 am on Nov 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1312
votes: 0


Feedfetcher doesn't claim to be "Googlebot" though.

The devil is in the detail.

...

3:28 pm on Nov 12, 2008 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


It implies it by redirecting the URL to the googlebot page. So if it doesn't have an rDNS I suppose google wants us, by its own advocation, to reject its robot.
4:18 pm on Nov 12, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1312
votes: 0


Don't get me wrong, I sympathise entirely. I too would prefer consistency.

The problem seems to lie with various Mountain View robots using IP ranges that are also proxies used by the Wireless Transcoder and Translator. But none are named "Googlebot".

...

5:17 pm on Nov 17, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Here I site with "hat in hand" and major portions missing from my humility being ingested.

In the past, I've not been plagued with the FAKES that have been mentioned here in numerous threads. Thus I didn't feel the need to implement a solution of access based on Google's IP's. (currently implemented)

Yesterday in a few short hours I had five US IP's and two RIPE IP's serving up FAKE Google UA's.
The IP ranges continue to grow midly.

In addition, I'm seeing new pokes and probes (from many IP's and UA's, most of which seem to fail under previous denials in place), which I haven't seen in recent months (or longer) that I may only assume are a direct result of these leaked through FAKE Googles.

10:22 pm on Dec 2, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


This one has been pinging me almost daily for about 10-12 page attempts.

I would like to think this is a real spoof and not Google being faked into crawling a proxy with this amount of frequency.

11:10 pm on Dec 2, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Bill,
A recent review of the UA's provided something I missed previously?
There's a double trailing blank space after a semi-colon.
11:30 pm on Dec 2, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Interesting, I didn't notice that.

Full trip DNS checking keeps them out but they just keep trying.

To what end?

11:58 pm on Dec 2, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


google bot NOT from Class B keeps them out ;)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members