homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Limit number of 403 errors from user
keyplyr




msg:1495681
 3:27 am on Oct 28, 2003 (gmt 0)


RewriteCond %{HTTP_USER_AGENT} ^Download\ (bŠDemonŠNinja) [NC,OR]

worked as intended here...

211.1.xxx.213 - - [27/Oct/2003:00:51:10 -0800] "HEAD /favicon.ico HTTP/1.0" 403 - "-" "Download Ninja 2.0"
211.1.xxx.213 - - [27/Oct/2003:00:51:10 -0800] "GET /index.html HTTP/1.0" 403 - "-" "Download Ninja 2.0"
211.1.xxx.213 - - [27/Oct/2003:00:51:10 -0800] "GET /page1.html HTTP/1.0" 403 - "-" "Download Ninja 2.0"
211.1.xxx.213 - - [27/Oct/2003:00:51:12 -0800] "GET /page2.html HTTP/1.0" 403 - "-" "Download Ninja 2.0"
211.1.xxx.213 - - [27/Oct/2003:00:51:12 -0800] "GET /page3.html HTTP/1.0" 403 - "-" "Download Ninja 2.0"

However, I'm hoping to find a more efficient method of blocking those who attempt to download my entire 200 page website since this continued for 34MB of error_log.

Is there a method to stop any connection to the server after, say, 3 or 4 403s from the same user? (Some generic rule prior to obtaining his IP)
Thanks.

 

jdMorgan




msg:1495682
 3:45 am on Oct 28, 2003 (gmt 0)

key,

You can ask your host to block them by IP address at the firewall, but that's the only way to "stop a connection."

An alternative is to make your custom 403 page very short - just put a link on it for humans to click for more information, and trim the <head> of the page down to the bare minimum tags. Humans may follow that link, but most bad-bots won't. At least this will keep your bandwidth down.

I've raised the point before, but it bears repeating: HTTP is a stateless protocol; Each request exists separate and independent of every other request -- The server makes no association between them. So while Webmasters may create scripts to track "sessions" -- associating groups of requests with each other, the server does not do so. As such, solutions for problems like this lie outside the server software itself, and must be scripted or coded and compiled-in separately.

Jim

BlueSky




msg:1495683
 4:11 am on Oct 28, 2003 (gmt 0)

I created a small 403 custom page and put in a long sleep. It hasn't seen any action yet. I'm hoping that frustrates these guys so they go away after a few minutes. If not, it will take them about 25 hours to request the same number of pages they've done in about five minutes.

jdMorgan




msg:1495684
 4:25 am on Oct 28, 2003 (gmt 0)

Sky,

That's a good idea for crippling very simple 'bots. However, the ones that run multi-threaded can just open a new and separate session while the previous request is still sleeping. The worst ones can hit you with dozens of requests in a second. However, they too have a limit on the number of threads they can support, so despite the fact that this isn't a cure-all, it does usually slow them down to some extent. Just make sure your server can support a large number of (sleeping) processes, otherwise the sleepers may hurt you more than they hurt the bad 'bots! :(

Jim

BlueSky




msg:1495685
 5:04 am on Oct 28, 2003 (gmt 0)

Jim:

The ones who have pulled pages very rapidly on my site for several minutes have been from the same IP so far. You bring up a very good point on the processes. On second thought, I guess I'll take this down just in case my jumpers decide to start attacking. Last thing I want to do is crash the server with too many sleeps. lolol

Back to the drawing board.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved