Forum Moderators: open
If your seeing that IP?
You'll see more IP's. And you'll see more from different Comcast IP's.
Read this thread from here down:
[webmasterworld.com...]
The solution is to either require the Goggle IP ranges or deny the invalid spaces in the UA.
I would howevewr heed caution, because in the past few days, I've seen requests with standard browser UA's from these IP's that were using the FAKE Google UA.
RewriteCond %{HTTP_USER_AGENT} ^(AdsBot¦AppEngine¦Mediapartners¦PageFetcher)-Google [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$ # example of allowed IP
RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$ # example of allowed IP
RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$ # example of allowed IP
RewriteRule .* - [F]
Any other IP using a Google UA that's not one of these allowed IP address will be served a 403 Forbidden.
For a list of valid Google IP address ranges see: [iplists.com...]
Note: Replace the broken pipe characters ( ¦ ) due to this forums software.
once you have the result, you can white- or blacklist the specific IP (or subnet, if you like) so that the whole process has to be done only once.
[googlewebmastercentral.blogspot.com...]
[edited by: GaryK at 1:19 am (utc) on Dec. 16, 2008]
UA: Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
Came from 75.146.149.xx, grabbed some root pages plus some subpages in a hurry, failed speed tests. Hey but a MSN bot also recently failed speed test and got grabby quickly.
fake googlebot resolved to Comcast Business (not home user) Minnesota range:
Comcast Business Communications, Inc. CBC-CM-5 (NET-75-144-0-0-1)
75.144.0.0 - 75.151.255.255
Comcast Business Communications, Inc. CBC-MINNESOTA-9 (NET-75-146-144-0-1)
75.146.144.0 - 75.146.159.255
Had to 403'd him
here's the exact UA pasted
"Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
(don't know if this box will change it tho, split line there)
It tried twice more after an hour or so, UA same. I 403'd it after 1st time. Hasn't been back since 23rd.
but I have my site monitoring write the orig recs out to a .txt file hourly in addition to writing a log database table with the complete orig record unparsed, keyed by IP + date time with the access record.
Can you briefly characterize their behavior? I have no idea what they would do, because I'm immediately booting them from my sites. As a result, all I can say is that each fake Gbot instance seems to go away after it gets a 403-Forbidden response -- I have no idea what they'd do if allowed access.
Thanks,
Jim
I'm getting three or four bad googles a day but unfortunately I have insufficient time to analyse them properly.
Of three hits today on our UK-based server with several virtual sites...
Site (A): UK broadband IP hit home page on site once and went away with a 403. Nothing else seen.
Site (B): UK broadband IP hit three related sites (only one has trap installed) before getting a 403 on site.
Site (C): US broadband IP hit home page then pricelist page, 403 each time, then left.
(B) came in with an unterminated folder in the URL (ie no trailing "/"), got a 301 and then came back with two hits to the same home page - 1 second between each hit so a slow browser or robot?
(C) suggests prior knowledge of site structure - 2.5 minutes between hits. The site is very low traffic so it was easy to notice a hit, 30 minutes previous, to robots.txt and another hit 2 seconds later than that on the home page from inktomi IP 74.6.17.n.
All instances had only the single google UA.
The Inktomi IP range is probably not significant. Various IPs in the range appeared in all logs as above although not as close as the one noted. The range also read other files such as CSS so probably shared between robot and page-checker.
Whether my suggestion of rotating UAs is correct or not I can't be sure. It was just a feeling. Today's batch suggest not since as I understand it the UAs rotate per access and there has been no previous access from the IPs with a different UA (that I can find!).