CrazyWebCrawler

The name has come up in a couple of discussions of distributed crawlers and similar
[webmasterworld.com...]
[webmasterworld.com...]
but they haven't got a thread of their own.

I've just spent some time poring over their page [crazywebcrawler.com], which says in part

If you'd like us to stop crawling your website,
the best thing to do is to block our web crawler using the robots.txt specification.
...
Blocking our web crawler by IP address will not work. Due to the distributed nature
of our infrastructure, we have thousands of constantly changing IP addresses. We
strongly recommend you don't try to block our web crawler by IP address, as you'll
most likely spend several hours of futile effort and be in a very bad mood at the end
of it. You really should just include us in your robots.txt or contact us directly.

Well, I guess I will have to contact them directly, since I do not perfectly understand how they can heed robots.txt directives when they have never once asked for robots.txt.

I cross-checked the three IPs that this UA has used. One of their ranges (162.243, long blocked) is shared by seoprofiler and MJ12bot-- which do ask for robots.txt-- but I am inclined to doubt they share robots.txt information. Nobody from 192.241.128.0/17 (also long blocked) has ever asked for robots.txt; same for 128.199.

My goodness. What an astounding coincidence. All three are Digital Ocean. On second thought I won't bother about direct contact; I'll just block the IP and UA both. That should do it.

CrazyWebCrawler

lucy24

trintragula

keyplyr

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week