Previously noted
here [webmasterworld.com]
UA: Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
IP: 54.70.40.abc (still AWS, but a different range than noted in the earlier post)
robots.txt: only when it suits them
I've just yanked my authorization for this robot. Earlier, they honored a User-Agent Disallow in robots.txt, and were allowed in. A couple days ago, they crawled three pages in a disallowed directory. I can conceive of no reason why they would expect to find “academic PDFs” in said directory, so, sorry guys, don't let the door hit you when it slams.