IP: various, ranging all over the globe. (So far only from the northern hemisphere, but I'm not prepared to give this any significance.)
UA: various pseudo-human, some blatantly robotic, others more realistic.
Pattern: Here's where the profilers have to do their stuff. Each visit consists of exactly four requests, from the same IP+UA set. /directory and /filename.html are random, generally a different one on each visit. The request is always for an interior, named file that actually exists. www.example.com is my site.
GET /directory/filename.html
REFERER http://www.example.com/directory/filename.html (that is, the same file)
Sometimes there will be a lag of a few seconds here.
GET /fonts/
REFERER usually http://www.example.com/fonts/
but sometimes only http://www.example.com/
GET /fonts/index.php
REFERER http://www.example.com/index.php
(This request gets an automatic 403 because of the php extension.)
GET /
REFERER http://www.example.com/index.php
They are so small and subtle that they slipped under the radar for a long time. When I did a systematic search, I found them back to mid-May. Visits from 0 to 5 per month, with 4 so far this month-- that's why I finally noticed them.
Minor anomalies: Normally the visits are scattered. On one calendar date in May there were two visits (same pattern, but everything else different as usual). A few days ago the robot du jour must have burped, because requests 3 and 4 were conflated into a single
GET /fonts/index.php/index.php/index.php
with its usual referer.
Anyone recognize this pattern?
Bit of trivia about the IPs: As I said, nothing noteworthy. Except that one recent visitor came from 208.115.125.38 -- an address that some of you may recognize. Formerly dotbot, more recently ezooms, and now it's apparently got a new roommate. It was at this point that I caved in and blocked the IP, formerly classed as "No skin off my nose".
Along the way, I looked up ezooms and was intrigued to learn that apparently nobody has the faintest idea what this robot is doing. Guesses, sure, but no hard evidence. Someone even tried that gmail address-- and got an immediate bounceback. In my case they are now sulking madly and eating 403s at-- as far as I can tell-- exactly the same rate that they used to eat pages. I will see if this changes.