Sujan

msg:3614913 | 8:51 pm on Mar 30, 2008 (gmt 0) |
PS: Some info for your makers: IP: 66.249.85.68 Hostname: ff-in-f68.google.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) Details: "kdlii" instead of "KDLII" in url, not replacing DKI
|
Sujan

msg:3615206 | 10:37 am on Mar 31, 2008 (gmt 0) |
Last night 74.125.16.37 started to do the same...
|
nakita_dog

msg:3615315 | 1:32 pm on Mar 31, 2008 (gmt 0) |
Why not just account for the lowercase version and do a 301 redirect to the mixed case version.
|
Sujan

msg:3615598 | 7:10 pm on Mar 31, 2008 (gmt 0) |
Because that would be ~ 1.000.000 possible redirects :)
|
Receptional Andy

msg:3615603 | 7:13 pm on Mar 31, 2008 (gmt 0) |
| that would be ~ 1.000.000 possible redirects |
| Is there a pattern between URLs that would need to redirect? If so, it's likely an accomplishable task. Sure, if bots make mistakes the makers should fix them, but there's no harm in a helping hand if possible :)
|
jdMorgan

msg:3615623 | 7:49 pm on Mar 31, 2008 (gmt 0) |
Are you sure that's AdwordsBot, and not a scraper crawling through a Google proxy? I've seen abuse from those "ff-in-fNN.google.com" hosts, and my impression is that they're not addresses used internally by Google. No, I'm not sure, but googlebots normally identify themselves as such, and not as browsers. Jim
|
Receptional Andy

msg:3615633 | 8:01 pm on Mar 31, 2008 (gmt 0) |
Good catch Jim. If it's Google then they should sort out their (r)DNS/proxies. They're registered as allocated 66.249.64.0/19 so I suspect it's their responsibility in any case. [edited by: Receptional_Andy at 8:03 pm (utc) on Mar. 31, 2008]
|
Sujan

msg:3615709 | 10:07 pm on Mar 31, 2008 (gmt 0) |
Receptional Andy, I of course already did. Just entered a lowercased column in the database that is now used as a fallback. Works for now, but what if the bot decides to reverse strings one day? jdMorgan, I am. Google Adwords bots use at least +/- 65 IPs and 3 to 15 useragents (depends on how you count) these days. The days they always identified themselves as "Googlebot" or "Adsbot" are long gone. Second place is "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)", followed by "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)". Because both of them didn't replace dynamic keyword insertion like they should have done, I'm sure it can't anybody from outside - because nobody knows these urls (and can't because from the outside you can't distinguish our params and e.g. the creative id). You see, I did my homework this time ;) And of course, I just hope somebody from Google Adsbot team reads this and reacts. Maybe... Jan
|
jdMorgan

msg:3615879 | 2:28 am on Apr 1, 2008 (gmt 0) |
66.249.84.nn is a crawler range 66.249.86.nn is also a crawler range 66.249.85.nn however, is not a crawler range. Of the addresses within that range that do resolve, all resolve to the ff-in-fNN.google.com hosts. I'd like to find out exactly what the ff-in-fNN.google.com hosts are intended to be used for. Jim
|
phranque

msg:3616118 | 11:49 am on Apr 1, 2008 (gmt 0) |
i've been getting small activity only from .88 - going back 5 weeks i see: - once some but not all weeks i see HTTP GET of /google0123456789abcdef.html and /noexist_0123456789abcdef.html by user agent "Google-Sitemaps/1.0". (returning 200's and 404's.) - 26 Feb HTTP GET one page and associated urls by user agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; InfoPath.1)", which was referred by a Gsearch in which this url was top 5. doesn't appear at first glance to have anything to do with adwords. - 27 Feb one HTTP GET of a .pdf by user agent "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12", which was referred by a Gsearch in which this url was top 5. doesn't appear at first glance to have anything to do with adwords. - 30 Mar HTTP GET one adwords destination url by user agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)", which was referred by a Gsearch referrer that appears to be manufactured from the destination url. as in if the dest url was "www.example.com/scriptname.cgi?yadda=yadda", then the search is made to look like "site:www.example.com scriptname", a search for which no results appear. if i had to guess, it's human testing.
|
jdMorgan

msg:3616152 | 12:48 pm on Apr 1, 2008 (gmt 0) |
Phranque, please confirm: It's 66.249.85.nn we're discussing here; and 66.249.88.nn *is* within their known "internal use" range. For some reason, .85.nn constitutes a "hole" in their otherwise-contiguous range from 66.249.64.00 through 66.249.95.254 Jim
|
phranque

msg:3616574 | 8:56 pm on Apr 1, 2008 (gmt 0) |
66.249.85.88
|
phranque

msg:3616614 | 10:12 pm on Apr 1, 2008 (gmt 0) |
forgot to add this: that analysis was for any traffic from 66.249.85. and ...88 was the only visitor on that adwords site. (it's a very small campaign) i'll do more analysis on a much larger campaign/site later...
|
phranque

msg:3616693 | 1:16 am on Apr 2, 2008 (gmt 0) |
ok here's the Sitemaps bot access pattern from the .85. range for 5 domains on one server over a 5 week period (with some fields edited or removed for clarity AND obscurity): 66.249.85.133 [27/Feb/2008:20:34:17] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [27/Feb/2008:20:34:17] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [27/Feb/2008:20:34:18] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [27/Feb/2008:20:34:18] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [27/Feb/2008:20:34:17] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [27/Feb/2008:20:34:17] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [27/Feb/2008:20:34:18] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [27/Feb/2008:20:34:18] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [27/Feb/2008:20:34:17] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [27/Feb/2008:20:34:17] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [21/Mar/2008:13:02:26] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [21/Mar/2008:13:02:26] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [21/Mar/2008:13:02:26] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [21/Mar/2008:13:02:26] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [21/Mar/2008:13:02:26] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [21/Mar/2008:13:02:26] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [21/Mar/2008:13:02:26] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [21/Mar/2008:13:02:26] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [21/Mar/2008:13:02:26] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [21/Mar/2008:13:02:26] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [28/Mar/2008:14:09:11] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [28/Mar/2008:14:09:11] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [28/Mar/2008:14:09:11] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.87 [28/Mar/2008:14:09:11] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [28/Mar/2008:14:09:11] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.133 [28/Mar/2008:14:09:11] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [28/Mar/2008:14:09:11] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.85 [28/Mar/2008:14:09:11] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [28/Mar/2008:14:09:11] "GET /google*.html HTTP/1.1" "Google-Sitemaps/1.0" 66.249.85.130 [28/Mar/2008:14:09:11] "GET /noexist_*.html HTTP/1.1" "Google-Sitemaps/1.0" so all 5 hit simultaneously from various ip's, 3 of the 5 weeks, and the ip "sticks" to the domain from week to week. i'm guessing i will find similar patterns for all sites we are tracking in GWT. i haven't correlated these times with those from the server mentioned in a previous post but the dates look familiar.
|
phranque

msg:3616710 | 2:09 am on Apr 2, 2008 (gmt 0) |
non-Sitemaps-bot access on that server from the following ip's: 66.249.85.68 66.249.85.69 66.249.85.85 66.249.85.88 66.249.85.133 and using the following user agents: - (as in non-specified) Mozilla/4.0 (Windows XP 5.1) Java/1.6.0_04 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; IEMB3; IEMB3) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.2; .NET CLR 2.0.50727) Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MEGAUPLOAD 2.0) Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.8;MEGAUPLOAD 1.0 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12 Mozilla/5.0 (compatible; Google Desktop)
|
phranque

msg:3616761 | 4:05 am on Apr 2, 2008 (gmt 0) |
this could be interesting. an unreferred GET of a keyword destination but without the typical adwords parameters from 74.125.16.37 using "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)". subsequent GETs of objects on that page are request from 74.125.16.37 as well as 66.249.85.68 using the same agent, including one case of a 301 returned to one ip and the subsequent request by the other. maybe some kind of proxy thing happening? again, i'm guessing a manual check for flagged situations since this particular keyword phrase happens to be one that gets high impressions but low CTR for us and the landing page would certainly pass the relevance test. from a largish campaign with thousands spent per month.
|
|