Forum Moderators: open
robots.txt? Yes
(See also: amazonaws.com plays host to wide variety of bad bots [webmasterworld.com])
Anyway, it behaved as all bots from Amazon behave: badly!
FWIW:
"Customers of Amazon EC2 may launch and terminate multiple instances over the course of a few hours which means that customers may occupy the same host at different times of the same day. ..." -Report Abuse of AWS Services [aws.amazon.com]
I see a train wreck coming unless these services get with it and implement some kind of method so that real-time rDNS is immediately available on the current using "organization" of a specific cloud IP address in some simple programmatically-accessible form, and 'historic' IP assignment info is available for at least a week in both program-accessible and human-accessible (e.g. browser-accessible) form.
Otherwise, legitimate and non-problematic spiders and services that we might want to allow to access our sites will be blocked because we have to 'defend' our sites against cloud-based activity that the cloud provider might consider legitimate, but that Webmasters consider to be scraping or abuse.
Right now, I'm blocking AWS and Azure, but I don't know what I'll do if some service that I want to allow to access my sites "moves into the cloud" and uses a non-validatable user-agent string. Unless they implement UA authentication as Majestic12 has recently done, this forces a really uncomfortable choice.
I hear the distant whistle and feel the vibration of the tracks, I see that the bridge trestle is broken, but I have no idea what to do about it...
Jim