lucy24 - 5:24 pm on May 19, 2011 (gmt 0)
Follow-up which I should have thought of in the first place:
This is the ordinary Web Preview IP-- the one I've htaccess'd out of one directory.
UA: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.14 (KHTML, like Gecko) Chrome/9.0.597 Safari/534.14
This is identical to the Web Preview UA but note the omission of "Google Web Preview" after "like Gecko".
This is a typical address for Googlebot (Preview uses others in the same range).
UA: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
The Googlebot we all know and love. It did everything by itself with no help from the imagebot.
For each page, the left and right files were collected separately, first by the googlebot (right or "preview" side) and then by the generic UA (left or "site" side). The googlebot must have pulled the robots info out of its database, because it didn't check right then and there. (The last robots.txt check was about 6 hours earlier. I think they check every 24 hours or so, but it takes several days to process the information.)
Each page was visited from scratch. The supporting files show up with the html page as referrer, the way they do when a human visits.
Coincidentally I'd only just made some visible changes to one rarely-touched page* so it was easy to confirm that they were using fresh data even before I looked at logs. Three days ago a page in the same directory got a 304, so they had no reason to expect a change.
The "roboted" pages weren't visited by the googlebot at all. The "site" UA went everywhere. Or tried to, in the case of the htaccess'd page.
* I happened to look at pages in this directory-- mainly dating from 2004 with minor edits in 2007-- and said "Eeuw! That's ugly!" But nobody ever goes there, so it isn't worth spending a lot of time.