Forum Moderators: phranque

Message Too Old, No Replies

How Long to Ban Bad Bot Behaviour

         

TorontoBoy

8:16 pm on Apr 28, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



So a bot has ravaged your site with a merciless scraping, done reconnaissance, tried to break in or some other bad behaviour, and you have banned it by bot UA, IP or IP of host. How long do you keep your ban? Years?

I collect the few really terrible and abhorrent host providers and ban their complete ranges. Apart from these consistently bad ISPs, I collect IP ranges, which I back date with comments. After 3 months I comment them out but keep their history, so that if they return it is easy to reban them.

Nothing stays the same, and changes can be quick quick. Bot UAs, IPs, host providers all change with time. If I keep all the bans the htaccess can get quite large, and may just be banning historical ranges and UAs.

Do you have a ban strategy or philosophy?

lucy24

10:33 pm on Apr 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In general, once someone is blocked, they stay blocked.

Back when my primary access control was IP-based, and UA blocks were purely supplementary, I did check back periodically. Then I'd delete the ones that hadn't been around in a year or so--unless their initial behavior was so “terrible and abhorrent”, I didn’t want to take any chances. This applies especially to scrapers attached to misguided humans (“hey, I love this site, lemme download the entire thing from top to bottom before even checking to see if maybe there’s only one directory with stuff I actually love”), since IP can’t be used as a blocking factor.

Now I do the opposite: UAs are blocked for the duration, unless of course they mend their ways. IP blocks are generally supplementary, and I check back every few months to see if they’ve gone away. The latter would be robots that are persistent and offensive--yee haw, let’s pile on the adjectives!--and that send fully humanoid headers, so an IP block is the only thing that works. Generally they get bored and go away, and don’t come back.

I do check for blocked humans and try to figure out what they did to get blocked. Most of the time it's something obvious where I can say “Well, tough, I didn’t want you anyway” but occasionally I’ve had to modify header-based rules, especially coming from mobiles. Or, in a particularly embarrassing case, I had to modify one of my carefully-crafted referer-based rules because I’d made a minor site-design change that resulted in everyone who clicked on a particular link getting blocked.

keyplyr

10:45 pm on Apr 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I ban all Server Farm IP Ranges [webmasterworld.com] for eternity :)
I allow access to only those agents that benefit my interests. Some server farm IP ranges have ISPs (humans) so those get access as well.

UAs shown to be bad stay banned until I'm certain they no longer are in operation, usually a year or two, however some keep coming.

Blocking Methods [webmasterworld.com]

TorontoBoy

2:53 pm on Apr 30, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thank you both. I will monitor my physical CPU utilization. It does not seem like a big htaccess has much effect on physical CPU usage. I will extend my 3 mo ban to a year and see if they return.

keyplyr

7:32 pm on Apr 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



. It does not seem like a big htaccess has much effect on physical CPU usage
My base level htaccess is over 200kb and Google's Page Speed tool says I have a "fast server."

TorontoBoy

7:55 pm on Apr 30, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks for the benchmark. My slimmer htaccess is currently 32kb, comments excluded, My old fat htaccess was 100kb, so I have a long way to grow.

keyplyr

8:37 pm on Apr 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Of course it depends what you have in your htaccess. Nested conditions/rules and excessive redirects would certainly add processing time, but in the big picture of things, a flat file is of the least concern regarding the factors contributing to page speed.