robot clusters

Someone out there has decided to spend their summer making up a new robot script, and it's been vexing me since mid-July.

IP: entirely random, but not typically The Usual Suspects or well-known server farms. Any one IP might do just one request, often two, rarely more. Most seem to be from ARIN ranges.
UA: each cluster uses a random selection of exactly five humanoid UAs, always a different set.
Headers: ample and various, rarely some that will trigger a lockout.
Requests: around 20-25 within a short time period, typically 20 seconds or so. Each cluster involves a random combination of {some specific page, different each time} AND the / root AND--usually but not always--robots.txt. There is absolutely nothing distinctive about the pages selected; they could easily put all my URLs in a hat and pick one.
Referer: randomly google OR root OR blank OR ... robots.txt. (This vexed me so much that I have added /robots.txt to the bad_ref environmental variable.) Redirected requests due to missing directory slash always give the original wrong request as referer, and always stay with the same IP-and-UA combo. Everything else is random.

Mutter, mutter, grumble.

162.43.242.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw HTTP/2.0" 301 481 "-" "{Firefox 101}" 206.204.33.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://www.google.com/" "{Chromium 101a}" 213.188.85.abc - - [09/Aug/2022:21:02:23 -0700] "GET / HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 152.39.227.abc - - [09/Aug/2022:21:02:23 -0700] "GET /robots.txt HTTP/2.0" 200 197 "https://www.google.com/" "{Firefox 99}" 162.43.242.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw/ HTTP/2.0" 200 46167 "https://example.com/ebooks/shaw" "{Firefox 101}" 152.39.227.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 149.71.176.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://www.google.com/" "{Chromium 101a}" 193.176.22.abc - - [09/Aug/2022:21:02:23 -0700] "GET / HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 31.204.13.abc - - [09/Aug/2022:21:02:23 -0700] "GET /robots.txt HTTP/2.0" 200 197 "https://www.google.com/" "{Firefox 99}" 31.204.13.abc - - [09/Aug/2022:21:02:23 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 76.189.21.abc - - [09/Aug/2022:21:02:24 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://www.google.com/" "{Chromium 101a}" 206.204.4.abc - - [09/Aug/2022:21:02:23 -0700] "GET / HTTP/2.0" 200 7849 "-" "{Safari 14}" 73.0.139.abc - - [09/Aug/2022:21:02:24 -0700] "GET / HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 71.72.184.abc - - [09/Aug/2022:21:02:24 -0700] "GET /robots.txt HTTP/2.0" 200 197 "https://www.google.com/" "{Firefox 99}" 144.142.209.abc - - [09/Aug/2022:21:02:24 -0700] "GET / HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 71.72.184.abc - - [09/Aug/2022:21:02:24 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Chromium 101b}" 141.242.156.abc - - [09/Aug/2022:21:02:24 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://www.google.com/" "{Chromium 101a}" 208.207.171.abc - - [09/Aug/2022:21:02:25 -0700] "GET /robots.txt HTTP/2.0" 200 197 "https://www.google.com/" "{Firefox 99}" 208.207.171.abc - - [09/Aug/2022:21:02:25 -0700] "GET /ebooks/shaw HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Firefox 99}" 206.204.4.abc - - [09/Aug/2022:21:02:29 -0700] "GET / HTTP/2.0" 403 3354 "https://example.com/robots.txt" "{Firefox 99}" 64.79.240.abc - - [09/Aug/2022:21:02:29 -0700] "GET / HTTP/2.0" 200 7849 "-" "{Safari 14}" 64.79.240.abc - - [09/Aug/2022:21:02:33 -0700] "GET / HTTP/2.0" 200 7849 "-" "{Safari 14}" 64.79.240.abc - - [09/Aug/2022:21:02:38 -0700] "GET / HTTP/2.0" 200 7849 "-" "{Safari 14}" 64.79.240.abc - - [09/Aug/2022:21:02:42 -0700] "GET /ebooks/shaw HTTP/2.0" 301 481 "-" "{Safari 14}" 64.79.240.abc - - [09/Aug/2022:21:02:42 -0700] "GET /ebooks/shaw/ HTTP/2.0" 200 46167 "https://example.com/ebooks/shaw" "{Safari 14}"

robot clusters

lucy24

phranque

lucy24

phranque

dstiles

lucy24

dstiles

lucy24

dstiles

lucy24

dstiles

lucy24

lucy24

dstiles

lucy24

tangor

dstiles

lucy24

dstiles

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week