thereplicant - 2:23 pm on Sep 28, 2010 (gmt 0)
Recently getting a lot of crazy traffic from Ask.com's crawlers moving at 20+ requests per second. Anyone else seeing this behaviour?
Our robots.txt contains the crawl-delay directive (set to 1), which according to the Ask.com FAQ pages is supported by their crawler. I can see that it requested robots.txt, but it's clear that it is ignoring the crawl delay.
For example, one if the crawlers is as follows:
User agent: Teoma/Nutch-1.0 (Question and Answer Search; email@example.com)
On the 24th of September, there were over 20 instances where the crawler hit our servers at a rate of between 33 and 38 times per second. We had over 100 instances where this crawler hit us at over 20 times per second.
In total, this crawler hit us 4958 times between 03:49:51 and 03:53:41 on the 24th of September, which works out to an average of 20 requests per second.
This is not the only crawler from ask.com that has behaved in such a fashion.
On the 23rd of September IP 18.104.22.168 (crawler9075.ask.com, same useragent as the other crawler) queried us 5010 times between the hours of 23:36:12 and 23:39:29, which works out to a rough average of 27 times per second.
In fact, every single day for the last week or two we get Ask.com crawlers coming in and spidering the site at insane speeds such as this. One of the most recent ones was 22.214.171.124 (crawler9073.ask.com), which hit us 4833 times on the 28th of September between 00:18:33 and 00:22:54.
I've tried emailing firstname.lastname@example.org (which is in the useragent), but got no response, and then I tried contacting ask.com using their on-line forms (also no response).
Anyone have any suggestions on what's happening here, or who I could contact?