-- Search Engine Spider and User Agent Identification
---- Google Web Preview
Mokita - 10:33 pm on Oct 28, 2010 (gmt 0)
Did it read/heed robots.txt? Or did it appear to 'share' prior Googlebot robots.txt hits?
Didn't ask for robots.txt. Can't tell if it heeds it, as it only took files that are normally allowed to human visitors. I have very few that are disallowed. But it certainly wasn't heeding the Disallows for search engine bots, as I don't allow them to index images, scripts or CSS.
Did it hit all kinds of files, launch JS, etc.?
Yes, took all supporting files - images, CSS and JS. Can't tell if it launches JS - I only use it for tabbed content.
Did it crawl in typical a Googlebot pattern/rate?
It has visited three of the sites I control (a minority). In all cases the pages were "deep". Didn't take home page or first level (category) pages.
First visit was 22 Oct, followed by 26 Oct (twice, 17 hours apart) for that site. IP was 64.233.172.n but oddly it fetched a few images using 74.125.75.n.
For the site mentioned in my OP, it has visited twice, on 24 and 28 Oct from 66.249.82.nn only.
On the last site, the behaviour was different again. It requested three closely related pages and their images in the same second on 26 Oct from 74.125.74.nnn. It returned on 28 Oct and crawled only one page from 74.125.152.nn.