homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Local / WebmasterWorld Community Center
Forum Library, Charter, Moderators: lawman

WebmasterWorld Community Center Forum

    
Bot, Agent, and High Abuse ISP Filtering Change
Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3270248 posted 5:46 pm on Jan 8, 2007 (gmt 0)

I made the following change:
  • First thread view from any IP is now open and unfiltered. Current time out is the cron job that does email (about 20 mins).
  • The second page view within the timeout time, are still under the same access rules and restrictions.
  • The previous htaccess bans are still in place.

What this means:

Visitors from the high speed/high abuse networks that were previously required to login on their first thread view, will be able to view one thread link before being prompted to login and/or register.

It also means the board will be much slower at times as we have track all those ip addresses now. Load has already jumped from 32% to 55% and we are just getting a good start at this.

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3270248 posted 5:54 pm on Jan 8, 2007 (gmt 0)

What makes this so CPU-intensive? Are the IPs all stored in one file?

If so, consider storing the IPs as a hash, or even simpler, make 256 files based on the initial octet of the IP address. That reduces the amount of data in each file while still having a reasonable number of files, and the filenames and data organization "make sense" at the human-readable/editable level.

Maybe you're already doing something like this, but that jump in the load sounds excessive.

Jim

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3270248 posted 6:03 pm on Jan 8, 2007 (gmt 0)

lol - I don't think I would load in that many ips into a hash JD. That would kill aot of precious memory in a hurry with perl. I'm trying the simple hack of each ip is a filename under a sub directory of that B block.

eg:

.../webmasterworld/ips/63/
.../webmasterworld/ips/63/63.42.58.147

Ya, that's alot of files, but beats loading something in and parsing it every page view. The overhead is in the file system. I'd rather abuse the file system than max out memory.

That should be good to 50-80k ips an hour (which is our historical peak load).

So far - it isn't nearly as bad as I had thought.

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3270248 posted 10:08 pm on Jan 8, 2007 (gmt 0)

Bret wouldn't it be easier to convert the ips to the integer representation. And just make the filename the ip range

example
Rackspace.com, Ltd.
72.3.128.0 - 72.3.255.255
1208188928-1208221695

would be
/webmasterworld/ips/72/1208188928-1208221695

so you can just check the range if it falls into it or not.

Think this would save some IO and be faster to boot, and not require any more memory then your other solution.

Ocean

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3270248 posted 4:29 am on Jan 9, 2007 (gmt 0)

ranges are good for filtering whole isp and networks, but not for individual user ip access filtering.

Tastatura

5+ Year Member



 
Msg#: 3270248 posted 4:50 am on Jan 9, 2007 (gmt 0)


* First thread view from any IP is now open and unfiltered. Current time out is the cron job that does email (about 20 mins).

I am curious why you made this change – is it to sort of offer a glimpse/ “teaser” what is available when they register, or….? Also, I am presuming that supporters area is not included in this change

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3270248 posted 7:15 pm on Jan 10, 2007 (gmt 0)

Tastatura, there has been so much bot abuse from the big cable and dsl isp's (rr, comcast, pacbell, earthlink, rogers, verizon, sbc...etc), that we ended up putting most of them on the required login list. Obviously - not an ideal situation. I know - I know, I should write a requried cookie routine. But what that means now is that about 10-15% of the people that visit are required to login/cookie'ize before viewing threads. Unfortunatly, those isps are where most of the web pro's and our prime members connect. That old system works great for slowing down bot abuse, but thwarts easy linking to one off threads from other places. This way, everyone gets the candy, but the bots.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Local / WebmasterWorld Community Center
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved