Does anyone have any closer acquaintance with this robot?
UA: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
IP: various AWS (to date: 3.224, 52.70, 52.91; 54.89)
From: amazonbot@amazon.com
robots.txt: asks, may be compliant
Eagle-eyed readers will note the semicolon in the IP list. That's because one of its (to date) four visits had a--ahem, cough-cough--slightly modified UA with matching From: header:
userAgent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
From: userAgentFrom=amazonbot@amazon.com
This in turn throws suspicion on all the others, though the URL in the UA does make it look legitimate. For a given definition of “legitimate”, given its place of origin.
It first showed its face on two long visits in April and May of this year. I
think this was during my computerless period, which explains why I didn’t notice. In consequence, it didn’t find its name in robots.txt, so I can’t say whether it would have been compliant. (By default, it is blocked on various header-and-IP grounds.)
But wait! The plot thickens. On its most recent visit, a few days ago, there are a total of six requests:
HTTP:
11:58:00 robots.txt from 3.224
11:58:00 root / (blocked) from 3.224
11:58:58 robots.txt (and nothing else) from 52.70
HTTPS:
11:58:15 robots.txt (and nothing else) from 52.70
11:58:32 robots.txt from 3.224
11:58:32 root / (blocked) from 3.224
As it happens, 52.anything sets the bad_range environmental variable, which in turn leads to a minimalist robots.txt where everything is disallowed. (This rule wasn’t in effect in April/May.) But 3.anything currently doesn't, meaning that the 3.224 robots.txt request got the version that lists disallowed visitors by name, and Amazonbot is not (yet) on that list.
Why did it visit twice, from two different IPs?
Hmmm.