Over the past few days, I've noticed requests allegedly from iPhone and iPad users, but with abnormal Webkit/Safari User-Agent strings.
At first, I thought maybe a new version of Webkit was to to blame -- mobile device makers don't seem to pay much attention to the generally-accepted 'rules' of constructing user-agent strings, and also don't seem to care about maintaining any kind of consistency even among their own products.
So I was considering modifying the regex I use for my whitelist, but then noticed something else: None of the requestors had ever apparently requested a page. They requested the CSS and all of the the images, but never the HTML pages themselves.
I dug through the recent logs several times, and saw no page fetches from any mobile devices which could have been used to cache the page and which would not have also cached the images and CSS. I keep all of these resources on a fairly short leash, cache-expiry-wise, and so did not have to go back more that three days to satisfy myself that whatever these requests are, they are probably not real mobile devices. They behave like a bot working from either previously-saved HTML pages or from a list of resource links harvested from those pages.
I don't intend to point out their exact errors, but here are two representative and complete UAs -- exactly as logged. It should be easy to compare to your logs and spot the differences, but I don't intend to point them out here and make their job easier...
Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko)
Mozilla/5.0 (iPad; U; CPU OS 3_2_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko)
The requests came from ISPs (some mobile-specific, some not) in the U.S., Germany, and Japan (so far).
Anyway, either these requests are fake or maybe I've missed something in the cache-controls that accounts for the odd fetching behavior and it is just coincidental that the UAs appear to be incorrect, but I suspect the former...
Jim