Forum Moderators: open

Message Too Old, No Replies

nasty distributed crawler

         

gurgeous

5:43 pm on Feb 6, 2009 (gmt 0)

10+ Year Member



Hi guys. Beware, there's a nasty distributed crawler out there. This one went largely unnoticed on my server for a few days, because each IP address only hits a few pages. Once I figured out what was going on, I put in some defensive measures to lock this thing out.

Yesterday I logged 17,500 requests from 1,400 different IP addresses for this crawler. Must be a botnet. It seems to be coming from these subnets:

128.241.104.x
130.94.x.x
147.203.x.x
168.143.
198.172.x.x
196.65.x.x
205.212.x.x
206.251.230.x
206.71.x.x
207.195.x.x
207.67.x.x
209.59.x.x
(most of these are NTT/Verio)

Luckily it's rather buggy, which makes it easy to detect. It looks for files like "index.html" and "default.asp", which don't exist on my server. It also has a tendency to append absolute paths to the current URL, resulting in 404s. It uses this User Agent:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

Is anybody else seeing this?

Adam

incrediBILL

9:48 pm on Feb 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I checked and didn't see this one, looked specifically for "default.asp" hits. nada.

GaryK

12:01 am on Feb 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First contact on August 31, 2008. As of last week most recently seen on January 31, 2009. I'll know more about this past week later tonight. Total of 6,450 visits since first contact. All apparently legit as in no bad-bot alarms triggered and no requests for robots.txt made. I don't keep track of IP Addresses. I used to, but gave up when that table in the database hit 64 GB in just a few months! ;)

My conclusion based only on what I've seen is it's a legit human-powered browser. Your data seems to suggest otherwise.

I'll look specifically for default.asp for you in this week's log files later tonight.