Forum Moderators: open

Message Too Old, No Replies

SemanticScholarBot

         

lucy24

9:05 pm on Dec 19, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Previously noted here [webmasterworld.com]

UA: Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
IP: 54.70.40.abc (still AWS, but a different range than noted in the earlier post)
robots.txt: only when it suits them

I've just yanked my authorization for this robot. Earlier, they honored a User-Agent Disallow in robots.txt, and were allowed in. A couple days ago, they crawled three pages in a disallowed directory. I can conceive of no reason why they would expect to find “academic PDFs” in said directory, so, sorry guys, don't let the door hit you when it slams.

keyplyr

1:59 am on Dec 20, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've blocked them from the beginning. As noted in the previous discussion, they scrape and republish PDFs (and likely other digital property) without the owner's permission. I consider that copyright infringement.