|Bot, Agent, and High Abuse ISP Filtering Change|
| 5:46 pm on Jan 8, 2007 (gmt 0)|
I made the following change:
- First thread view from any IP is now open and unfiltered. Current time out is the cron job that does email (about 20 mins).
- The second page view within the timeout time, are still under the same access rules and restrictions.
- The previous htaccess bans are still in place.
What this means:
Visitors from the high speed/high abuse networks that were previously required to login on their first thread view, will be able to view one thread link before being prompted to login and/or register.
It also means the board will be much slower at times as we have track all those ip addresses now. Load has already jumped from 32% to 55% and we are just getting a good start at this.
| 5:54 pm on Jan 8, 2007 (gmt 0)|
What makes this so CPU-intensive? Are the IPs all stored in one file?
If so, consider storing the IPs as a hash, or even simpler, make 256 files based on the initial octet of the IP address. That reduces the amount of data in each file while still having a reasonable number of files, and the filenames and data organization "make sense" at the human-readable/editable level.
Maybe you're already doing something like this, but that jump in the load sounds excessive.
| 6:03 pm on Jan 8, 2007 (gmt 0)|
lol - I don't think I would load in that many ips into a hash JD. That would kill aot of precious memory in a hurry with perl. I'm trying the simple hack of each ip is a filename under a sub directory of that B block.
Ya, that's alot of files, but beats loading something in and parsing it every page view. The overhead is in the file system. I'd rather abuse the file system than max out memory.
That should be good to 50-80k ips an hour (which is our historical peak load).
So far - it isn't nearly as bad as I had thought.
| 10:08 pm on Jan 8, 2007 (gmt 0)|
Bret wouldn't it be easier to convert the ips to the integer representation. And just make the filename the ip range
126.96.36.199 - 188.8.131.52
so you can just check the range if it falls into it or not.
Think this would save some IO and be faster to boot, and not require any more memory then your other solution.
| 4:29 am on Jan 9, 2007 (gmt 0)|
ranges are good for filtering whole isp and networks, but not for individual user ip access filtering.
| 4:50 am on Jan 9, 2007 (gmt 0)|
* First thread view from any IP is now open and unfiltered. Current time out is the cron job that does email (about 20 mins).
I am curious why you made this change – is it to sort of offer a glimpse/ “teaser” what is available when they register, or….? Also, I am presuming that supporters area is not included in this change
| 7:15 pm on Jan 10, 2007 (gmt 0)|
Tastatura, there has been so much bot abuse from the big cable and dsl isp's (rr, comcast, pacbell, earthlink, rogers, verizon, sbc...etc), that we ended up putting most of them on the required login list. Obviously - not an ideal situation. I know - I know, I should write a requried cookie routine. But what that means now is that about 10-15% of the people that visit are required to login/cookie'ize before viewing threads. Unfortunatly, those isps are where most of the web pro's and our prime members connect. That old system works great for slowing down bot abuse, but thwarts easy linking to one off threads from other places. This way, everyone gets the candy, but the bots.