SemanticScholarBot

Forum Moderators: open

Message Too Old, No Replies

SemanticScholarBot

lucy24

9:05 pm on Dec 19, 2017 (gmt 0)

Previously noted here [webmasterworld.com]

UA: Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
IP: 54.70.40.abc (still AWS, but a different range than noted in the earlier post)
robots.txt: only when it suits them

I've just yanked my authorization for this robot. Earlier, they honored a User-Agent Disallow in robots.txt, and were allowed in. A couple days ago, they crawled three pages in a disallowed directory. I can conceive of no reason why they would expect to find “academic PDFs” in said directory, so, sorry guys, don't let the door hit you when it slams.

keyplyr

1:59 am on Dec 20, 2017 (gmt 0)

I've blocked them from the beginning. As noted in the previous discussion, they scrape and republish PDFs (and likely other digital property) without the owner's permission. I consider that copyright infringement.