Forum Moderators: open

Message Too Old, No Replies

To block "bad" IP ranges, or to allow "good" ones ?

         

Dimitri

11:50 am on Jun 28, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hi

I didn't want to ruin the Server Farms topic, but still wanted to share a thought.

Like some others, I am monitoring closely the activity at my sites, who access what, why and from where. And I am blocking lot of IP , and IP ranges because of bot/suspicious activities I don't want. But it is endless, I have more than 500,000 IP or IP ranges blocked. I had to write a program to manage them, to monitor what is going on, to detect false positive, etc.

So sometimes, I wonder if it wouldn't be easier to block all IP, and to allow only known "good" ones. Like IP from "good" robots, and IP from domestic/mobile internet connection, as well as companies IP ranges, which are not hosting servers... and accept collateral damages.

tangor

1:14 pm on Jun 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This sounds like a choice between white listing (who I let in, blocking all others) and black listing (blocking them as I find them). both methods have merit ... but the white listing version sure cuts down on the work load!

That said, whitelisting means you'll have to maintain a secondary activity: discovering any new ranges that are potentially adding and then testing the results. That can nearly as much work as black listing.

wilderness

2:22 pm on Jun 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dimitri,
If your blocking 500k in IP's than your ranges are too precise (narrow) on the side of caution for that one good visitor.

Each webmaster must determine what is beneficial or detrimental to their own site (s).
Only you know the objectives and markets of your site (s).

With the above in mind, nothing prevents you from creating your own Extranet. Extranets are a major portion of the web.
keep in mind that you should always leave room for a method contact in your denies. Most visitors that are denied won't even bother to use the alternative contact, however a few will, and then you poke holes for access.

TorontoBoy

2:59 pm on Jun 28, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Blacklisting will always be an ongoing task. Bot runners, new and old, always want to find new ways of annoying you, so you need a bit of maintenance. This is an arms race. It looks like your ranges are too narrow, so widen them up.

Whitelisting has the danger of not allowing people to browse your site. I do not do this, though it is easier on maintenance.

lucy24

4:21 pm on Jun 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Several years ago I got tired of playing whack-a-mole with IP ranges, and changed over to header-based access controls. At any given time I do still have a few (fairly small) IP ranges blocked if they've been flooding me with requests that can't get blocked by any other means. But in general, the occasional robots that get in--isolated pages, not top-to-bottom spiderings--are just not worth bothering about.

The underlying principle is: judge them based on what they do and how they do it, not where they live. She said, piously.

TorontoBoy

5:01 pm on Jun 28, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



header-based access controls

Is there an easy way to learn more about this? Any links? I've been wanting to get into this but am a bit lost. How do you start?

lucy24

7:42 pm on Jun 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How do you start?
Well, by determining what server you’re on, because I only know how to do it in Apache. If that’s the case, mosey on over to the apache subforum, where we will talk about how to use mod_setenvif in conjunction with mod_authzthingummy.

Edit: As the utterly predictable punchline to the above, I have just denied a /16 and a /24. I'll check back in a few months, when I will almost certainly find they got bored and went away.

keyplyr

2:38 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most of my sites block 5k to 6k IP ranges. Makes management easier for high traffic sites. Smaller, less traffic intensive sites can use simpler methods.

You do need to use prejudice by letting through beneficial agents.

Blocking Methods [webmasterworld.com]

keyplyr

3:12 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder if it wouldn't be easier to block all IP, and to allow only known "good" ones. Like IP from "good" robots, and IP from domestic/mobile internet connection, as well as companies IP ranges, which are not hosting servers... and accept collateral damages.
The sum of ranges most would block is significantly less than those ranges most would allow.


Reminder: Any code examples/discussion should be done in the Apache Code Forum [webmasterworld.com]

not2easy

3:46 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Speaking of header based access controls, we had a discussion some years ago that might help with ideas - not discussing it here, just pointing it out: [webmasterworld.com...]

lucy24

5:36 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, thanks, not2easy, that lets me say with confidence that yes, it is iBill's logheaders code I've been using all this time. (How the heck do you find this stuff? Seems like on every forum there's one person whose superpower is finding ancient threads.)

Dimitri

10:27 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Thank you for your replies.

My initial post was also a little bit ironical too, seeing the amount of suspicious activities constantly increasing, and from new IP ranges all the time. If you remove #*$!, p2p, spams and suspicious bots activities, real internet usage would be a fraction of what it is.

keyplyr

10:46 am on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Be careful about what IP ranges get added to your block list.

Very often, compromised ISP accounts are used by malicious bots. Blocking that IP address is futile since it is usually detected and fixed by the ISP after a day or two.

Likewise with botnets. They may hit your server using a dozen or more IP addresses but blocking them is useless since the computers you are blocking will likely never be used again by that bad actor.

Dimitri

12:18 pm on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Be careful about what IP ranges get added to your block list.

Yes, I do. Depending of the infringement detected, I block a single IP for a given length of time ( from 5 minutes, to 24 hours ), as more infringements are detected repetitively, the banning period is extended. Since I am paranoiac, I am monitoring everything too. It's certainly not perfect, but so far so good.

not2easy

12:49 pm on Jun 29, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you are blocking IPs individually for behavior, it could make your life easier to block by behavior. A list of UAs, a list of requests, a list of researched IPs can make your log analysis more manageable.

@lucy24 - I searched for the suggested name of the file: "logheaders.php" in DDG. ;)

blend27

1:06 pm on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am monitoring everything too. It's certainly not perfect, but so far so good.

Setup more that one site as a bait & compare notes. Merge rules based on similarity. Setup a few forum based sites on a throwaway domains that are not linked to you and let them spam & scrape all they want. Learn.

Then get a Fish Tank! :)

I started in 2003 and/but still get scraped once in a while. It is a brain stimulant, if you will.

added:

Is it a Valid SE Bot(including RDNS and IP Ranges)?
Are Headers valid(including UA)?
Is it a hosting range IP?
Was it Previously a banned IP for any reason(I have a numbering system 1-22 based on several factors above and below)?
Is it valid Country IP Range.
Do they Have a Fish Tank(just kidding...:)) = ability to answer custom CAPTCHA if the got caught before.

Welcome to My Sites!

Dimitri

2:47 pm on Jun 29, 2018 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



added:

Yes, roughly doing the same.

There are also :

- buggy DNS and RDNS
- ackward referers
- hits on a page that a human cannot know about, and which is disallowed in the robots.txt
- all attempts to test for exploits of Wordpress, or other CMS, ... since I am not using any.