Forum Moderators: open

Message Too Old, No Replies

Robot - Meaningful Machines

ultra agressive spider

         

ziegast

6:55 am on Jan 26, 2006 (gmt 0)



I have a load-balanced e-commerce web platform with multiple servers. Several of my servers were dying under heavy load today (thankfully not all of the servers). After figuring out that there was no good internal reason for my machines having problems, I tracked the problem down to a bunch of similar source IP addresses pounding one of my sites with resource-intensive requests.

The source network 64.94.***.*** had a gang of 8 machines hitting one of my stores all at once. They tried to access robots.txt 55 times in one second. They tried to access a shopping cart CGI 132 times in one minute with different cart IDs. They repeatedly accessed the same unchanging product pages over and over again.

Meaningful Machines has lots of press articles (startup hype), but have no contact information to a live person. My only recourse is to complain to their ISP and warn others.

I have ticket number ****** open with InterNAP. If you have had similar problems recently, call their NOC at <snip> or email <snip> with the ticket number on the subject line. They seem responsive, but they won't really know the scope of the problem their customer is causing until others complain.

This event was a good exercise to motivate me to work on denial-of-service techniques. It got me thinking about how I could have automatically detected and responded to this problem without manual intervention. Humans don't request dozens of pages a second. The odds of more than 4 humans shopping at the same store at a sime from a single network are improbable. If a web client requests robots.txt from my server, they should have future requests from their network go to a rate-limited server for a minimum of one week.

[edited by: volatilegx at 9:39 pm (utc) on Jan. 27, 2006]
[edit reason] removed specifics [/edit]

stage

2:03 am on Feb 27, 2006 (gmt 0)



Hi,

I have been hit by them too.

I have just emailed them, and blocked them in our firewall.

This bot is far too nasty. Several thousand hits within a few minutes, sometimes on the same page within a second.

No identification (just Jakarta Common HTTP client), no information on their website on how to block or speficy a slower scan interval.

I could not 100% determine which IP address the bots are originating from so for the time being I blocked
*.133 up to *.159

This one is really bad.

wilderness

3:43 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's almost as much crap coming from Internap as there is from Hurricane Elec.

RewriteCond %{REMOTE_ADDR} ^64\.9[45]\. [OR]

wilderness

3:45 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No identification (just Jakarta Common HTTP client),

May not be a bad idea to ad Jakarta to your UA denies.

keyplyr

9:37 am on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



May not be a bad idea to ad Jakarta to your UA denies.

Jakarta HTTP Client is used by various edu libraries and even the Library of Congress to verify links, of course this may or may not be a good thing depending on how you look at it :)