Forum Moderators: open

Message Too Old, No Replies

Google attempting crawl with invalid Mozilla Uesr-agent

I hope this is just a G employee just fooling around

         

jdMorgan

1:55 am on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hot off the server log file:

216.239.38.136 - - [02/Jan/2006:20:46:26 -0500] "GET /specs.html HTTP/1.0" 403 707 "-" "Mozilla/4.0 (compatible; MSIE 6.0;)"

That IP address resolves to Google Inc. in Mountain View, but the User-agent string is incomplete -- no OS, OS version, or encryption level. Yes, I know these are 'optional' fields, but they're always present on valid browser visits. There is also a subtle syntax error in that UA string.

I'm pretty sure this is a 'bot. because I serve a *very* short 403 error page to save bandwidth, and you have to click a text link on that short page to get more info. Most innocents caught by my UA filters *do* click that link, but this visitor did not.

I'm feeling a bit like Brett now, having 403'ed those requests! <grin>

Jim

volatilegx

3:17 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Something else interesting about this one -- 216.239.38.136 does not reverse-DNS into a hostname, like Googlebot IPs.

Pfui

4:16 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thank you for asking about apparently stealth Google hits.

Since at least last October, I've seen accesses from the following Google IPs with non-Googlebot UAs. Googlebot asks for, and heeds, robots.txt on its umpteen times/day rounds; these do not.

Also, after I started mod_rewriting the IPs, I noticed some changed upon reaching the target page.

IPs:

64.233.172.2 -> redirected; IP changed to -> 64.233.172.21
64.233.173.73
64.233.173.100
72.14.194.29 -> redirected; IP changed to -> 72.14.194.18

User-Agents:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
MovieTrack

Referers: None

At first I thought these Google hits were from employees but the redirect/target page includes info about how to contact me for access, and none of the Google 'users' has ever done so.

So now, I figure Google bots -- or people/rogue bots running through Google results? No clue. But it's increasingly irksome and worrisome because these IPs are going where Google is not supposed to go.

(Aside: Jeeves has been stealthy via IP, too, and also ignoring robots.txt. Same kind of situation, I wonder?)

fiestagirl

4:18 pm on Jan 3, 2006 (gmt 0)

10+ Year Member



[webmasterworld.com...]
Msg #3. I suppose I should have started a new topic for this one then.

Got itself banned in September.

Pfui

4:50 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We meet again, fiestagirl:) You and I wondered about Google IPs and UAs in that same thread. Alas, still no solid answers here -- just more hits, and questions!

For example, I thought "Google-TR-1" was Google desktop search-related. E.g.:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8; Google-TR-1) Gecko/20050511 Firefox/1.0.4 (ax)

Then again, I don't know about Desktop Search at all. Shoot, with so many real and/or apparent Google-related IPs and UAs, it's more than a little confusing. E.g.:

Mozilla/4.0 (MobilePhone SCP-5500/US/1.0) NetFront/3.0 MMP/2.0 (compatible; Googlebot/2.1; [google.com...]

Bot? Phone? Bot for phone? Faux bot and fauxne? Whatever. If The Googles don't 'do' robots.txt, they're 302'd or 403'd.

jdMorgan

8:55 pm on Jan 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> MobilePhone SCP-5500

That's a Sanyo mobile phone coming through Google as a proxy HTML-to-WAP translator. As such, it's not a robot and won't fetch or obey robot.txt. If you use a bad-bot script, it will need to be explicitly allowed.

Jim

bull

7:44 am on Jan 7, 2006 (gmt 0)

Key_Master

8:41 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey jdMorgan, I see the same user-agent with 64.233.178.136. I don't believe it's a bot (although it could be) but in any case it's being followed by Mediapartners-Google/2.1 on my site.

flyerguy

8:52 pm on Jan 9, 2006 (gmt 0)

10+ Year Member



Well this is great. I've got a bad bots script that would most likely exclude this type of half-declared agent.

Google what are you thinkin, follow the standards!

pocpocpoc

7:24 am on Feb 9, 2006 (gmt 0)

10+ Year Member



My guess is that we're seeing Google's anti-cloaking schemes at work. If the page they fetch with their stealth agent doesn't (reasonably) match the page in the Google index, they figure the page is cloaked.

Google will eventually have to buy some anonymous colo if they're going to keep this up.