Best Practices for blocking in Windows Server, IIS, .NET environment? - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Best Practices for blocking in Windows Server, IIS, .NET environment?

webcentric

6:18 pm on Mar 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

This is my first post in this forum so my apologies if I've missed a discussion that already deals with this.

After reading quite a ways back in this forum, it's quite obvious that the majority of the discussion related to blocking is centered around the Linux/Apache/PHP environment and even these discussions are enlightening to me as a .NET developer. Still, when it comes down to blocking in a Windows environment, there are obvious differences in how the subject is approached.

I've used a variety of approaches to the blocking topic in the past including using Windows Firewall, the Ip Address and Domain Restrictions feature of IIS Manager, database lookups, etc. What I'm looking for now is a more comprehensive view on the subject and recommendations for the most scalable and effective approach to blocking in a Windows environment.

In the grand scheme of things and to keep the deny lists as compact as possible, I do to think in the following order of precedence generally.

1. Blocking country traffic (like most of the world outside the US and Canada).
2. Blocking bots and server farms
3. Blocking nuisance users

I'm interested in providing the correct responses per type as well a using server resources efficiently. Would be nice to just employ a hardware firewall for some of this but it's not an option at this time. Any insights into this particular subject would be most appreciated. While I have some knowledge of the above approaches, I have no problem with answers that address this subject from a beginner's level either. It's amazing how easy it is to forget or totally miss very fundamental things so I have nothing to prove and everything to gain from even basic information. I'm sure others will find this thread and hopefully see the value as well. Thanks in advance.

wilderness

10:39 pm on Mar 18, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Here's an old thread [webmasterworld.com]

blend27

12:45 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

In order to understand any problem you will have to first learn the data that the problem generates, in this case it would be REQUEST data. Start logging HEADERS data/patterns programmatically in your app. Most sites I run are on IIS7.x and "bot blocking firewall" is done programmatically.

1. IIS request filtering for most blatant abuse: [iis.net...] or .HTACCESS for IIS
2. Whitelisting Good Bots(ip ranges) with exceptions(keep ranges in memory, there is only a dozen or so)
3. Request headers examined(this includes UA, Refferer, and such) << this is the JUICY part.
4. DB look up if IP has been banned already, X it again it again(keep last 10+ in memory)
5. DB look up for white listed country.
6. ALWAYS LOG THE REQUEST for later examination.

3 & 4 could be switched when needed. But 6 is a must.

Now Get to Logging.

webcentric

4:08 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Start logging HEADERS data/patterns programmatically in your app.

Sounds like an excellent first step (I guess w3c log files aren't going to cut it for this). I'm wondering if you recommend any particular approach to logging? One question is whether IIS Advanced Logging can accomplish what you're suggesting (but at the server level) or if application based logging is more efficient/adaptable to the circumstances? Also, are you suggesting logging to a DB or a text file? The question also arises as to whether to use an http module, code something into Global.asax or into the pages themselves or something else. I guess the best question to ask at this stage, what's the best approach to capturing request info and for storing it?

webcentric

8:30 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

After having spend some quality time with the ServerVariables property of the Request class today, I've come up with a list of items that seem relevant for logging and subsequent analysis as follows

HTTP_X_FORWARDED_FOR
REMOTE_ADDR
REMOTE_HOST
REMOTE_PORT
REQUEST_METHOD
HTTP_REFERRER
URL (base URL)
HTTP_URL (raw encoded)
UNENCODED_URL (obviously not encoded)
QUERY_STRING
HTTP_USER_AGENT
HTTPS

Off course, the log will need some date/time info so the question becomes, what else would be useful to capture? I'm thinking that capturing the ALL_RAW and/or ALL_HTTP information would be useful as well (even if just for reference). I can see how the above information can be used to support the logic for visitor analysis and blocking so I'm thankful the point of logging was made.

I also think I'm starting to see why logging may be best handled in the application itself. For one, it's probably the only way to go in a shared environment. I was thinking originally from a dedicated server perspective but building logging directly into the application makes it useable in either setting. More questions to come, I'm sure. Insights are always most welcome as this line of questioning is sure to evolve.

dstiles

9:02 pm on Mar 19, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I've been running ASP Classic sites for a long time. Over a decade ago I began trapping IPs for known bad UAs to a text file, using that file as a "blacklist" and updating it manually to group server farms and such that this highlighted.

This was all right until the internet got to be a really nasty and dangerous place, when the text file began to take far too long to parse. At that point I began looking at a MySQL solution, automatically dumping bad IPs to the database and reading them back for each web page access across all sites (a couple of dozen sites at that point). Each page includes the ASP script (via a virtual directory) that checks for a whole range of factors, including unwanted UAs, media versions of otherwise acceptable bots and, of course, whether the IP has been previously blocked. If it's a DSL-type IP with a bad payload the IP is added to the blocklist for 24 hours; if it comes back it gets blocked for another 24 etc. If it's a server farm or a bad neighbourhood the whole range gets banned, with an occasional hole drilled for good proxies.

A number of log files (about ten) are generated by the system that allow me to keep track of good bots, bad farms and new IPs. Of these the latter is the most important, although the others are useful: eg recently the bots log showed a lot of google failures that indicated an idiotic change of bot UA.

Typical delays for this system are around 10 to 50 mS when a log is written; faster when not. I consider this reasonable.

Being home-grown there is room in the ASP script to add special logging (eg if a site owner is having a problem) and for bypassing blocks or auto-blocking by country or ISP, as given in the database.

An advantage of this system is being able to determine, via a dedicated web page, what IPs are in the database and who they belong to (if an IP is not in the database it means it hasn't come to my notice so is probably benign). Through the same web page I can add, alter and (rarely) remove IP ranges. The downside is that I have to check the generated "new IP" log two or three times a day for new ranges to see if I should a) block a server farm or b) enter it as a DSL range for future reference. If I don't do that the IP is still blocked but not properly listed/banned

From observation of this forum I suspect this is similar to how linux people hereabouts set up their .htaccess systems, which are probably more efficient; but that's ASP for you: wish I'd never been persuaded to use it. :(

There is, I understand, a poor-relation version of .htaccess for windows. In earlier versions it's proprietary and can work out expensive for a multiple-site server. Newer versions of windows have a version included - my server-rental guy says it's still crippled compared with linux/unix versions. I'm about to upgrade to a Windows 2012 server (from 2003) using the same approach but intend looking at .htaccess as a possible front-line buffer.

Oh, each web page access is still recorded in relevant the w3c site log. It's sometimes useful.

I long ago gave up on global.anything as useless and seldom working. Maybe that's just me.

blend27

1:07 am on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

@webcentric

I've come up with a list of items that seem relevant for logging

First rule about the Fight Club - don't talk about Fight Club. This is what makes WebmasterWorld! extendable to any environment.

Now: what else would be useful to capture?
In Short: When one needs to start learning = Everything! Any bit of data from current request/how this current requst shares similarity with other requests. Then build your logic over 1. Memory management. 2. DB calls Optimization.

@distiles

enter it as a DSLrange for future reference.

Group them *iches by RDNS list(period as delimiter in the list)

@webcentric
Want to learn more? Set up a few bot magnet sites, open forums, guest books - cost's dime a dozen, then use rule #1.

In anything one does - do not be afraid to block the user, who ever they are, but don do it on the production site.

webcentric

3:11 am on Mar 20, 2014 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

First rule about the Fight Club - don't talk about Fight Club.

Two black eyes, a busted lip and broken nose later...I thought this was the gardening club. ;) Point taken...on the chin. This whole topic has me intrigued (gardening that is) so I see lot's of reading and listening in my future. Will take a measured approach to this and work with the clues at a pace that my sometimes dormant receptors are capable of handling. Many thanks for the wealth of information gleaned from this board already.

Then build your logic over 1. Memory management. 2. DB calls Optimization.

I think that this is really the issue I'm working through at the moment. Particularly optimizing db access. Perhaps this leads to some questions better asked in a specific db forum. Will try to stay on point (or better yet, just get into sponge mode) going forward. :)