Forum Moderators: open
74.125.75.xx - - [24/Feb/2009:01:23:45 -0500] "GET /robots_excluded/javascript.js HTTP/1.1" 200 701 "http://www.example.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"74.125.75.xx - - [24/Feb/2009:01:23:45 -0500] "GET /robots_excluded/style.css HTTP/1.1" 200 31117 "http://www.example.com/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"
Googlebot requested the index page of this site (which references the files above) a couple of days ago. Note that CSS and JS are robots excluded. There were no other requests from this IP.
No rDNS, but seems like pretty botty behaviour to me. I note the reference to this IP range in the appengine [webmasterworld.com] post also, but I don't think this is the same thing?
Both CSS and javascript files are disallowed in my robots.txt - which Googlebot honours.
But these requests are never from Googlebot.
...
maybe the Web Accelerator but I thought it was axed
Google have indeed withdrawn it recently, buy it is still available elsewhere.
...
1. Link posted by another entity elsewhere http://www.example.com/robots_excluded/javascript.js
2. From another server by sending redirect headers upon a request (ex: in PHP) forcing the bot to access a link.
header("HTTP/1.1 301");
header('Location: http://www.example.com/robots_excluded/javascript.js');
Other ways are also possible, of course internal requests can also be forced in a similar manner.
I run some tests and seems that the above may happen, I was also asking here
[webmasterworld.com...]
for the landing pages regarding slurp in the past when I noticed some strange behavior.
What I haven't confirmed yet is whether or not forcing page access is also indexed in the SE results. I haven't seen them indexed by just doing that so far.
Of course in your case may not be a foul play and the request may came as a verification step from google for the secondary files listed with the page content.
In order to check for attempts to manipulate the SERPs the search engines have to look at things such as CSS and javascript (even where disallowed), and the idea that this is always done with a manual review seems far-fetched to me, given the size of the web.
I believe these requests are automated checks - a robot from Google that does not declare itself.
I allow them because I accept that they are necessary and I have nothing to hide. From what I have read in this forum others block them with impunity, which suggests that a manual check will be forced in such circumstances.
Either way, Googlebot itself gets a clean bill of health.
All a bit of a sham, but pragmatism suits me.
...