Forum Moderators: open

Message Too Old, No Replies

VelenPublicWebCrawler

         

TorontoBoy

8:26 pm on May 7, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



UA: VelenPublicWebCrawler (velen.io)
Robots.txt: No
Host: DIGITALOCEAN
IP: 174.138.63.xx
174.138.0.0 - 174.138.127.255
174.138.0.0/17

scraped me hard

keyplyr

11:31 pm on May 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't seen it, thanks for posting.
...goal with this crawler is to build machine learning models and extraction algorithms to build structured datasets from raw web pages. This crawler follows robots.txt and meta instructions

Access blocked if it's on a Digital Ocean IP range. I let just a few UAs through from those ranges. Let them build their tools on someone else's bandwidth.

keyplyr

9:30 pm on Sep 20, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also coming from...

Host: ovh.net
51.68.0.0 - 51.68.255.255
51.68.0.0/16

Using both...
UA: Go-http-client/1.1
UA: VelenPublicWebCrawler (velen.io)