@lucy24, @all
let me throw this one out there:
Sites I code are custom CSM based sites that are hosted on latest IIS Servers, usually shared hosting with 100-ish or so other sites on the same box. Unless there is DDOS against a specific, that is picked up by hosting company, there is realty no significant overhead in using dynamic language scripting that is connected to DBMS with 10ns of queries per page. With the largest Query to DBMS that is against an indexed view that scans ip_RANGES_table(1341613 rows) / left joint / asn_table(77180 rows) that costs me 0.021 seconds. The rest of the queries cost me almost 0s time-wise the final page is spout out to the browser. The DBMS is located on a separate Box at the hosting server with about 100-ish other databases, so no sweat there.
Now this is the way the request is processed for efficiency sake and in the following order;
There are 3 levels of software in this Firewall that are in place:
LEVEL I: IIS web.config
LEVE II: .htaccess
LEVEL III: ColdFuson/DBMS
Level I - Highest level. web.config(with rules(URLRewrite2 for IIS) that point to maps).
At this level I filter out/drop/rewrite requests that can easily spotted by some simple logic based on
a) KNOWN as AWS, Oracle, MSFT, TENCENT, Major Cloud Scraping providers etc.., etc.. IP Ranges are dropped outright, and I don't give a pack of flying Quality Diamond File Steel Quenched 6 Pcs Finishing Tools about it.... 0 bites returned - no responce code, can't connect.
b) ...previously blocked IPs caught and written to a map file by rules that can for example, if anything, is requesting "wp-URIs" or .php-and-such in URIs*, those IPs stay in the map file for a month. On a first try, if caught, request from IP is rerouted(rewritten/not redirected) to another app that is not accessible via web from outside, this way main website application code/server load is not effected.
^* basic rule: pattern="^.*(config|wp-|rss|\.php|\.git|\.inf|\.asp|\.aspx|\.pl|\.cgi|-cgi|\.env|env|\.aws|lavarel).*" = rewrite to another app logic, many other rules there too. sql ingections, super old UAs, UAs that don't make sense anymore/anyless, etc.....
Logic in that other application at first writes to main map file so the next request is dropped, then looks up IP in DBMS and to determine if it is a known Hosting Range and if found simply increments the counter of blocked requests from that range. If NOT found looks up AbuseDB via API(Free when registered) to see if it is Hosting IP, if it is blocks that IP Range entirely in DBMS(review range later), no play games here. Request Blocked, Range Blocked, ASN Flag is on Alert.
c) you name it....
LEVEL II (if you got here - you good, but wait for lev3): No Headers are inspected here. Basic Rewrite/Point/Redirect request to the main app folder based on what is requested and some other blackmagicphuckery...
Notice that Main App has not seen a packet of traffic yet.
LEVEL III (Welcome to ColdFusion):
We Inspect the HTTP Headers with data from getHttpRequestData() : no good(many rules) = request is blocked and written to DBMS for later inspection.
We make a call to "Blocked IPs" in DBMS, see it it there; seen you before this past 6 month = no cheese, logged into DBMS.
So at this point we still don't know where the IP came from(Country, Hosting, WhiteListed Bot). WhiteListed Bot(and there are only select few get a soft pass, unless they are trying to get to Robots.txt disallowd content). Others we do a lookup, same query by CF to DBMS. Hosting or manually blocked range? = less than 16 bytes sent back(403). Not in Country list = 206 with either a Captcha(self written, polite) or an "accesso negato in inglese". RU, KR, CN, IR, IL, SA... etc traffic is not recorded(blanked 403).
-------------------------------------------------------
All Requests at this point written to DBMS. Back end App has A ColdFusion lookup utility that can be used to filter by:
Date From/To(goes back to 2006), IP/RDNS/ASN all free form, UA(free form, 48260 to chose from that visited my sites) + UA Type(Mobile/Tablet/Desktop),Browser Resolution, Country List(incl/exclude), if Image/CSS/JS Enabled, URL Visited(also includes and OLD Url being 301ned to knew URI, including WhiteListed Bots), SITE/URI Referrer. That is for Allowed Traffic.
Blocked requests also include what Bought them the Status of such.: Country, Bot Trap, BAD UA, Bad URL, Bad QueryString(sql injections and Such), HTTP Method(no HEAD, No PUT, ok I will stop...), Guest Book Spam/Form Submission without previously established session, Self Referrer OR Referrer is from http:// version of the same site OR let say they pretend to click on imaginary link that is missing on a particular page(ye, that is also tracked, i keep my ship tight ;) ).
Sooooo..., IP Data, now(ever since SumGuy mentioned it, THANK YOU SumGuy) comes from IPInfo, I download DB, parse data IPRanges by country, +separate table for ASN from the same data linking them together. Import data into DB, run a script to split it, update allowed countries, ASN Blocks(hosting ranges based on that). Before that if was a similar automated script that got the data from FTPs of Regional IP Data(ARIN and such) providers and maintained a separate list of Hosting ranges.
Now here is the Sweet spot: whenever the request comes in and whether it is blocked or allowed, ranges for that are written to a top level lookup tables, this way I ask data from those tables first, only if it is not there, or lets say ASN is coming as Blank from IPInfo tables, I would make an API Call somewhere else to determine what is what. Hundreds of a millisecond if that.
Database is Normalized to the Teeth. I personally hold a certification from MS for SQL Server and from Oracle from back in a day. MySQL is not my thingy.
AdbuseDB FREE API calls are also full of juicy info: usageType >> Data Center/Web Hosting/Transit field info for example and Spam Confidence/Total Reports fields, I also contribute programmatically...
This is IPV4 Traffic only. Data maintained over 16 websites, a lot going on.
This code base started when Florida Update hit(hard, very) and I started slaying MFA scrapers(its is their fault, I swear on a neighbor's squeaky piglet).
I have not utilized any other statistics on applications on my sites since 2005, write it myself and do not share Visitors Data with big Boys. Customers who request outside implementation of statistics like Googs must pay more.
----------------------------------------------------------------
and then I can see it all LIVE, in the back-end App, ;)
I learned most of it in this Forum and WebmasterWorld, Thank You.
...blend27
;