Forum Moderators: open

Message Too Old, No Replies

Brandwatch/Magpie-crawler

         

Asia_Expat

9:39 pm on Nov 24, 2009 (gmt 0)

10+ Year Member



This nasty little bot has cast me three days and a hefty sysadmin fee to identify the problem. It turns out Brandwatch have been flooding my port 80 with around 28 requests per second. It's been crippling my server for days, around 3 hours of cumulative downtime per day...

The following range has been blocked and the problem cleared...

94.228.34.192/26
94.228.37.0/27

They're a UK registered company, where you can get up to 10 years for DoS attacks. I wonder, next time if I'm in the UK, I should demand some kind of satisfaction for my time and money lost?...

wilderness

2:03 am on Nov 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Asia,
It's NOT my desire to appear obnoxious or arrogant, however there are a handful of keywords in User Agents that should be standard denials in every default htaccess or htpd.conf.

One of of these standard terms is "crawl".

In addition, any major SE that utilizes any of this group of keywords, should immediately close their doors as they have removed any possibility of promoting competence from their org.

There are longtime participants in this forum that have converted from blacklisting to whitelisting, however many participants use a combination of both.

Many thanks for the heads up.

Don

jdMorgan

3:42 am on Nov 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That sounds like some sort of record for abuse!

What response(s) were you serving to their requests? -- This info might help others avoid the "DOS" effect.

Looks like 94.228.32.0/20 will block the whole lot...

Jim

Asia_Expat

8:29 am on Nov 25, 2009 (gmt 0)

10+ Year Member



Hi wilderness,

I've been running my own server for a couple of years, learning everything as I go along and indeed, I've sure still got a lot of learning to do. This recent DoS incident was a wake up call for me, no doubt about it.

jdMorgan, I was looking in the wrong place for days, because this flood of bots appeared RIGHT at the same time I had to replace a faulty HDD and rebuild the OS, so I was just concentrating on the hardware/software configuration, pretty stupid really. They were just getting accepted as any other request 200, now they're just getting dropped, no response.

While researching their IPs, I noticed there are plenty of posts on webmaster forums about this crawler, but not with the same IP ranges, so I wonder if they changed recently?

Asia_Expat

12:11 pm on Nov 25, 2009 (gmt 0)

10+ Year Member



I actually just had an email from brandwatch, acknowledging that their bots were visiting my website for the period I was experiencing problems... and that they're investigating what has happened. They said that my robots.txt allows all bots. At least they're good enough to respond to my complaint, fair play to them for that... I'm not sure any ethical bot would see an 'allow all' as an invitation to DoS a website however.

jdMorgan

2:33 pm on Nov 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most likely a bug -- the robot got stuck in loop and didn't do the usual "wait awhile" function. We've seen that before here with a different robot, IIRC.

This seems to happen more frequently now that everyone uses 'structured coding' languages and techniques; Some programmers let their 'structures' get too long and too deep, and once a routine gets to be several pages long with "If" clauses indented half-way across the page, errors just seem to multiply.

Jim