homepage Welcome to WebmasterWorld Guest from 54.166.10.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Fake Googlebot from StandardShell
Spoofed UA, no additional headers
jdMorgan




msg:3783043
 5:16 pm on Nov 9, 2008 (gmt 0)

Bearing in mind all the discussion in the recent Fake Googlebot from SoftLayer [webmasterworld.com] thread, here's another apparently-fake Googlebot (note the almost-correct UA string) from CalPop/CoreExpress/StandardShell

No extra headers were sent. No rDNS is available.

64.69.34.135 - - [09/Nov/2008:03:52:46 -0500] "GET / HTTP/1.1" 403 666 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Jim

 

keyplyr




msg:3783275
 8:09 am on Nov 10, 2008 (gmt 0)

I've had problems with at least one scraper from coreexpress hosting and their range blocked for a few months.

wilderness




msg:3783424
 1:24 pm on Nov 10, 2008 (gmt 0)

Here's a CoreExpress from May 2007. (note the improper UA).

64.69.46.zzz - - [07/May/2007:12:40:18 -0500] "GET /MyFolder/MyPage.htm HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)"

This recent activity (i. e., Jim's heads-up) merely assures us of the necessity required for documentatation and some effective plan of action when these pests first appear, which would have prevented this recent activity.

I detest even mentioning colo's, as the mention merely provides them with free advertising.

Don

The Contractor




msg:3783501
 3:55 pm on Nov 10, 2008 (gmt 0)

Jim

I see the following:
64.69.34.135 was logged 123 times,
starting at 07:03:49 AM on Thursday, November 6, 2008.
The initial browser was Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]

They hit with 0 seconds in between. I'm sure it's safe to block via IP range...yes?

jdMorgan




msg:3784532
 1:54 am on Nov 12, 2008 (gmt 0)

Safe? I don't know, and wouldn't tell anyone what to do anyway...

Since Google encourages Webmasters to check reverse-DNS on Googlebot requests, but there is no rDNS on this IP address, and since this user-agent did not send the usual and customary Googlebot headers, I blocked it without further thought. But that's just me... :)

Jim

dstiles




msg:3784553
 2:46 am on Nov 12, 2008 (gmt 0)

rDNS checks might help if google maintained them. I keep getting feedfetcher (to three domains at the same time) from an IP that has no rDNS at all, although it's a google IP. The included URL is redirected to a generic googlebot page.

Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
IP: 72.14.193.*

Samizdata




msg:3784563
 3:28 am on Nov 12, 2008 (gmt 0)

Feedfetcher doesn't claim to be "Googlebot" though.

The devil is in the detail.

...

dstiles




msg:3784859
 3:28 pm on Nov 12, 2008 (gmt 0)

It implies it by redirecting the URL to the googlebot page. So if it doesn't have an rDNS I suppose google wants us, by its own advocation, to reject its robot.

Samizdata




msg:3784884
 4:18 pm on Nov 12, 2008 (gmt 0)

Don't get me wrong, I sympathise entirely. I too would prefer consistency.

The problem seems to lie with various Mountain View robots using IP ranges that are also proxies used by the Wireless Transcoder and Translator. But none are named "Googlebot".

...

wilderness




msg:3788067
 5:17 pm on Nov 17, 2008 (gmt 0)

Here I site with "hat in hand" and major portions missing from my humility being ingested.

In the past, I've not been plagued with the FAKES that have been mentioned here in numerous threads. Thus I didn't feel the need to implement a solution of access based on Google's IP's. (currently implemented)

Yesterday in a few short hours I had five US IP's and two RIPE IP's serving up FAKE Google UA's.
The IP ranges continue to grow midly.

In addition, I'm seeing new pokes and probes (from many IP's and UA's, most of which seem to fail under previous denials in place), which I haven't seen in recent months (or longer) that I may only assume are a direct result of these leaked through FAKE Googles.

incrediBILL




msg:3798629
 10:22 pm on Dec 2, 2008 (gmt 0)

This one has been pinging me almost daily for about 10-12 page attempts.

I would like to think this is a real spoof and not Google being faked into crawling a proxy with this amount of frequency.

wilderness




msg:3798668
 11:10 pm on Dec 2, 2008 (gmt 0)

Bill,
A recent review of the UA's provided something I missed previously?
There's a double trailing blank space after a semi-colon.

incrediBILL




msg:3798684
 11:30 pm on Dec 2, 2008 (gmt 0)

Interesting, I didn't notice that.

Full trip DNS checking keeps them out but they just keep trying.

To what end?

wilderness




msg:3798703
 11:58 pm on Dec 2, 2008 (gmt 0)

google bot NOT from Class B keeps them out ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved