VelenPublicWebCrawler

Forum Moderators: open

Message Too Old, No Replies

VelenPublicWebCrawler

TorontoBoy

8:26 pm on May 7, 2018 (gmt 0)

UA: VelenPublicWebCrawler (velen.io)
Robots.txt: No
Host: DIGITALOCEAN
IP: 174.138.63.xx
174.138.0.0 - 174.138.127.255
174.138.0.0/17

scraped me hard

keyplyr

11:31 pm on May 7, 2018 (gmt 0)

I haven't seen it, thanks for posting.

...goal with this crawler is to build machine learning models and extraction algorithms to build structured datasets from raw web pages. This crawler follows robots.txt and meta instructions

Access blocked if it's on a Digital Ocean IP range. I let just a few UAs through from those ranges. Let them build their tools on someone else's bandwidth.

keyplyr

9:30 pm on Sep 20, 2018 (gmt 0)

Also coming from...

Host: ovh.net
51.68.0.0 - 51.68.255.255
51.68.0.0/16

Using both...
UA: Go-http-client/1.1
UA: VelenPublicWebCrawler (velen.io)