Forum Moderators: open

Message Too Old, No Replies

Google IP Requesting Images

No user agent for the images

         

incrediBILL

5:45 pm on May 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's a thread in the Google search forum that will probably interest all of you:
[webmasterworld.com...]

The IP is 74.125.16.67 and I see it requesting images like the OP over there stated but I also see a few instanced of ""Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14" which typically indicate screen shots but I'm not so sure in this event.

I don't see it using robots.txt either.

Anyone have anything to share that can shed some light on this?

Thanks.

wilderness

6:12 pm on May 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The following from Feb 11, 2008.
It appears some kind of pre-fetch

74.125.16.70 - - [11/Feb/2008:04:13:17 -0600] "GET /MyFolder/ HTTP/1.1" 200 13034 "Valid other website" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
71.43.82.zzz - - [11/Feb/2008:04:13:52 -0600] "GET /SameFolder/ HTTP/1.1" 200 13034 "SAME Valid other website" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"

71.43.82.zzz <snip>images related to index pages of requested folder</snip>

There was also some other activity (possibly one or both IP's) which I failed to record that was the result of pages from my other site (not recorded) requesting additional pages to this site from 74.125.16.70 IP.

incrediBILL

8:04 pm on May 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sounds like you think it's the web accelerator?

wilderness

8:16 pm on May 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure Bill.

I'd need to pull a DVD and look at back-up logs fron Febuary (I remove my active logs on the 1st of each month).

Made note of it, however I did NOT implement a deny for either the Google IP or the RR IP.

Don't recall if I've seen it since or not.

wilderness

3:14 am on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill,
Dug out the Feb logs.
1) the google IP grabs the page before the visitor. (using the visitors UA).
2) The visitors IP grabs the same page.
3) The visitor CLICKS a link to different folder & Page
4) The visitor CLICKS a link to sister website
5) the google IP spiders some but, not all links off the page requested in line (4) above.

All this on the same day and within 2.5 minutes.
Please note; when the visitors IP was utilized all pages images were requested as well.

The google IP returned five days later requesting a different page on one site from a valid (DMOZ) listed referral.

The google IP returned 13-days later (from initial request)(on other website) from a "valid" RIPE google search and was denied access (i. e., RIPE to my sites).

Hope this helps.

Don

venti

3:46 am on May 30, 2008 (gmt 0)

10+ Year Member



We see the 74.125.16.70 IP quite often. Most times it is the Coop Feedcatcher and the Feedfetcher (we have several coop feeds and a iGoogle app), however some other user agents have shown up:

Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+en-US;+rv:1.8.0.7)+Gecko/20060909+Firefox/1.5.0.7

The version of Firefox gets updated as time goes on:

Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+en-US;+rv:1.8.1.3)+Gecko/2007030919+Firefox/2.0.0.3

We have also seen this same IP address request numerous images and always without a user agent when doing so.

Samizdata

5:36 am on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I often see visits from various Google IPs including the 74.125.xx.xx range.

Sometimes they take a single HTML file, sometimes a JavaScript, or a CSS, or a few images.

Sometimes it's a Linux UA, sometimes Windows, sometimes none at all (especially for images).

Sometimes they work in tandem with GoogleBot, sometimes with the Wireless Transcoder.

I have images, CSS and JavaScript disallowed in robots.txt and GoogleBot itself never takes these.

So I assume it is quality control, and they are making sure that I am not trying to fool them.

As long as they are really from Google I am unconcerned.

Others can expect no mercy.