Forum Moderators: open
216.239.38.136 - - [02/Jan/2006:20:46:26 -0500] "GET /specs.html HTTP/1.0" 403 707 "-" "Mozilla/4.0 (compatible; MSIE 6.0;)"
That IP address resolves to Google Inc. in Mountain View, but the User-agent string is incomplete -- no OS, OS version, or encryption level. Yes, I know these are 'optional' fields, but they're always present on valid browser visits. There is also a subtle syntax error in that UA string.
I'm pretty sure this is a 'bot. because I serve a *very* short 403 error page to save bandwidth, and you have to click a text link on that short page to get more info. Most innocents caught by my UA filters *do* click that link, but this visitor did not.
I'm feeling a bit like Brett now, having 403'ed those requests! <grin>
Jim
Since at least last October, I've seen accesses from the following Google IPs with non-Googlebot UAs. Googlebot asks for, and heeds, robots.txt on its umpteen times/day rounds; these do not.
Also, after I started mod_rewriting the IPs, I noticed some changed upon reaching the target page.
IPs:
64.233.172.2 -> redirected; IP changed to -> 64.233.172.21
64.233.173.73
64.233.173.100
72.14.194.29 -> redirected; IP changed to -> 72.14.194.18
User-Agents:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
MovieTrack
Referers: None
At first I thought these Google hits were from employees but the redirect/target page includes info about how to contact me for access, and none of the Google 'users' has ever done so.
So now, I figure Google bots -- or people/rogue bots running through Google results? No clue. But it's increasingly irksome and worrisome because these IPs are going where Google is not supposed to go.
(Aside: Jeeves has been stealthy via IP, too, and also ignoring robots.txt. Same kind of situation, I wonder?)
Got itself banned in September.
For example, I thought "Google-TR-1" was Google desktop search-related. E.g.:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8; Google-TR-1) Gecko/20050511 Firefox/1.0.4 (ax)
Then again, I don't know about Desktop Search at all. Shoot, with so many real and/or apparent Google-related IPs and UAs, it's more than a little confusing. E.g.:
Mozilla/4.0 (MobilePhone SCP-5500/US/1.0) NetFront/3.0 MMP/2.0 (compatible; Googlebot/2.1; [google.com...]
Bot? Phone? Bot for phone? Faux bot and fauxne? Whatever. If The Googles don't 'do' robots.txt, they're 302'd or 403'd.