...goal with this crawler is to build machine learning models and extraction algorithms to build structured datasets from raw web pages. This crawler follows robots.txt and meta instructions
Access blocked if it's on a Digital Ocean IP range. I let just a few UAs through from those ranges. Let them build their tools on someone else's bandwidth.