I managed to get all our sites pulled from Archive.org some years ago. A pedestrian, though painless procedure.
Though this serpent obeyed robots... today, their CIDR is now blocked.
207.241.225.65 - - [25/Sep/2016] "GET /robots.txt HTTP/1.1" 301 596 "-" "python-requests/2.11.0"
207.241.225.65 - - [25/Sep/2016] "GET /robots.txt HTTP/1.1" 200 3144 "-" "python-requests/2.11.0"
NetName: INTERNET-ARCHIVE-1
207.241.224.0 - 207.241.239.255
207.241.224.0/20 Ber...locked
Kindly let me know if you dig up any more of their ranges.