but never grabbed robots.txt
How odd. On my site, they rarely get anything
but robots.txt.
:: closer study of processed logs ::
Through the first half of the year, they collected plentiful pages, but since July it’s been robots.txt and nothing else.
:: further inspection of shared robots.txt ::
Huh. I have a Disallow on Awario
SmartBot, but that seems to be a different robot, from 85.10.219.nnn.
Plain AwarioBot is about ten times as common:
65.21.113.nnn - - [10/Aug/2024:02:51:02 -0700] "GET /robots.txt HTTP/1.1" 200 4196 "-" "Mozilla/5.0 (compatible; AwarioBot/1.0; +https://awario.com/bots.html)"
Earlier in the year, they came from 94.130.237.nnn
:: final study of headers and robots.php ::
Oh, all is now clear. Coming from bad_range (65.21), they get a minimalist
User-Agent: *
Disallow: /
and-that's-all. But that
would seem to imply they’re compliant. Page requests aren't getting blocked; they're not made in the first place.
I've just poked a hole, and will see what happens.
In any case yours may be a faker, though; I haven't seen anything from 37.27.129, though there's a scattering of unwelcome robots further down in 37.27. (But where do you get Iran? I just looked it up and got Finland, which wouldn't be so bad except it's, drumroll, Hetzner. That's grounds for a deadbolt right there.)