Forum Moderators: open

Message Too Old, No Replies

LittleScraper

         

keyplyr

8:44 pm on Jan 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




UA: LittleScraper 0.1
Protocol: HTTP/1.1
Robots.txt: Yes
Host: Google Cloud
35.192.0.0 - 35.207.255.255
35.192.0.0/12

lucy24

12:37 am on Jan 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With a name like that, what could possibly go wrong?

Robots.txt: Yes
I've seen at least one scraping tool that has a user-configurable option: to honor robots.txt or not. This strikes me as vaguely analogous to a conscientious, morally upright burglar who will only rob a place if the door happens to be unlocked.

keyplyr

12:53 am on Jan 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IMO some UAs request robots.txt to circumvent being blocked.