Forum Moderators: open

Message Too Old, No Replies

What do we do with all the IP blocks gathered?

What technology to block crawlers, .htaccess or scripts?

         

dougwilson

5:26 pm on Oct 12, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month




System: The following message was cut out of thread at: http://www.webmasterworld.com/search_engine_spiders/4691876.htm [webmasterworld.com] by ocean10000 - 10:24 am on Oct 12, 2014 (utc -8)


Now that we've gathered all these ip blocks we do what?

I only have htaccess to work with... would we want to add all these there? Or are you using a script?

If script, could I get a copy?

I'd like to block all creepiness: webhosts, tools, datacenters... But a megabyte size htaccess seems too "big"... maybe I'm wrong and it's fine...

I don't know what the effect would be. I know people have country's blocked, so...? Just wondering

aristotle

6:52 pm on Oct 12, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In my experience you can block the large majority of bad bots by using characteristics such as country domain codes, user-agent fragments, and referer fragments. Using long IP lists to pre-block "clouds" and "server farms" is popular here on this forum but in my opinion time consuming and often less efficient, and also requires viligence in case blocks are sold or leased in the future. To me, it comes down to how much time you can devote to blocking things. I generally let minor offenders have a pass, but take action if something starts becoming obnoxious.

lucy24

7:02 pm on Oct 12, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Deny from blahblah


Repeat ad lib.

For most practical purposes there is no difference between htaccess and config; very few people have the resources to set up a preliminary firewall. Your htaccess will not be megabytes. Mine is
:: shuffling papers ::
currently 24k for the htaccess shared by all domains on my userspace. The site-specific htaccess files are about the same size in addition. Yes, it does mean the server has to plow through the whole thing on every request. (Most stuff in config is read just once, at server startup.) But still, 5 or 10 or 50k is still way smaller than the size of the actual files you'll be sending out when you add up everything that belongs to a page. If you've got analytics, the .js part is probably that big too, and you don't even think about it.

Unfortunately there is no way to say

Deny from China
Deny from Ukraine

et cetera, so you have to do it by the numbers. List the IP blocks in numerical order, plus any other grouping that's convenient to you. I've found it's easiest to keep track if I have separate sections for RIPE, ARIN and China. (AfriNIC and LACNIC are something like one line each. Maybe two for the rest of APNIC.)

Most people use the form
Order allow,deny
Allow from all
Deny from long-list-here

Whitelisting using
Order deny,allow
tends to rely more heavily on user-agents rather than IP ranges.

not2easy

7:10 pm on Oct 12, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



One more thing - You don't want to deny every single IP listed here and don't bother blocking individual IPs. The IPs are listed here in CIDR blocks because that way you can block evrything that might come from that host/server farm. Blocking one means they'll be back in an hour with the last digit different.

To know what to block, look at your access logs. See what imitation humans and known UAs are visiting and start with those. There is a ton of useful information here in the charter to help you learn the basics.

Another 'One more thing' - don't plan to add some code and walk away because these things are ever changing and evolving, which is what makes this part of the forum so useful to this work.

keyplyr

10:06 pm on Oct 12, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You don't want to deny every single IP listed here...

Personally, I block every server farm & hosting company range I find out about whether I detect their activity in my logs or whether I see their ranges posted here or elsewhere. Simply put - other hosting servers have absolutely no legit reason to talk to my hosting server.

...don't bother blocking individual IPs.

I do, sometimes a dozen or more day. These are mostly infected machines sending probes for vulnerabilities. They will often be used later in a botnet, so it is prudent to block them immediately. I block these on a temporary basis and usually remove them in a couple months; presumably by that time they are found out by the owner and fixed.

I don't worry about the size of the htaccess file. Done correctly, there will not be any noticeable impact on your site response time. What IS more of a concern is the speed & efficiency of your server and the backbone/path it is on. Also, If you are on a shared hosting service the resource management of the other accounts on your machine should be considered. One resource hog can slow all the other sites on that machine. Investigate and choose your neighbors wisely.

dougwilson

11:44 pm on Oct 12, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Okay, thanks, and yes I would like to block every host and LLC, datacenter etc... I can deal with individuals... have a good server (a well run Debian)

I block about 60% of visits to my sites as it is now. ...and yes my only concern is page load.

I'll start at the beginning here and work through the posts... thanks again

not2easy

1:29 am on Oct 13, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Well, yes, I block a few individuals who are running bots on Verizon and Comcast IPs, I meant to not bother to block individual IPs in a known range, I've seen it a lot or I wouldn't mention it.

dougwilson

1:56 am on Oct 13, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



"I meant to not bother to block individual IPs in a known range"

I understand, I mean I can tolerate a few people being stupid...

aristotle

1:34 pm on Oct 13, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplyr wrote:
I do, sometimes a dozen or more day. These are mostly infected machines sending probes for vulnerabilities. They will often be used later in a botnet, so it is prudent to block them immediately. I block these on a temporary basis and usually remove them in a couple months; presumably by that time they are found out by the owner and fixed.

I don't think you should assume that these will be found out and fixed by the owner within a two-month period. If the trojan is part of a botnet, it may "hide" on the infected machine so as not to reveal itself until the botnet has been built up to a much larger size. Then at some point all of the trojans in the botnet will be called into action simultaneously to stage a DDOS attack. But it might take much longer than two months for that to happen.

I also think that it is futile to try to protect yourself against a large-sacle botnet attack by blocking individual IP addresses, because you would have to block tens of thousands of them, which wouldn't be practical.

keyplyr

10:03 pm on Oct 13, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month






I don't think you should assume that these will be found out and fixed by the owner within a two-month period.
I just said how I do it... you do it your own way.

If the trojan is part of a botnet, it may "hide" on the infected machine so as not to reveal itself until the botnet has been built up to a much larger size. Then at some point all of the trojans in the botnet will be called into action simultaneously to stage a DDOS attack. But it might take much longer than two months for that to happen.
Well then you wouldn't know about it would you?

I also think that it is futile to try to protect yourself against a large-sacle botnet attack by blocking individual IP addresses, because you would have to block tens of thousands of them, which wouldn't be practical.

I have never seen a botnet attack where "tens of thousands" of individual IPs have hit my server. Typically, I see one to two dozen IPs, which I usually have blocked.

dougwilson

5:07 pm on Oct 14, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



doing some ip look ups this morning... got to thinking about htaccess file size and my original question... I was thinking more about file time than size.

I guess I could do some testing and find out the difference in ms's between this and that but I'm setting right on top of my server so it's different for those in Spain or Australia

Here's a question... does this:

deny from 173.208.0.0/16

take longer than this:

deny from 173.208.64.0/22

right now, with everything as is I get this:

Server Software: Apache
Server Hostname: www-dgswilson-com
Server Port: 80

Document Path: /footer.txt
Document Length: 1897 bytes

Concurrency Level: 1
Time taken for tests: 0.839 seconds
Complete requests: 10
Failed requests: 0
Write errors: 0
Total transferred: 21770 bytes
HTML transferred: 18970 bytes
Requests per second: 11.92 [#/sec] (mean)
Time per request: 83.875 [ms] (mean)
Time per request: 83.875 [ms] (mean, across all concurrent requests)
Transfer rate: 25.35 [Kbytes/sec] received

aristotle

5:52 pm on Oct 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplyr - Thanks for your reply.
What i wanted to tell you is that the visits from infected machines that you're seeing now, a dozen or so per day, could be part of a preparation for an eventual major attack against your site. According to this article, [globaldots.com ] , almost 3000 DDOS attacks occur per day worldwide, so I don't think anyone should assume that they've completely safe.

When you block the IP of an infected machine, you're blocking something that has already come and gone on what might well have been just a reconnaissance visit. The trojan might never come again except much later as part of an all-out DDOS attack. These reconnaissance visits would likely take place immediately after a machine is initially infected. Then the trojan could go into hiding by going dormant until it is re-awakened by a command from whoever controls the botnet.

A dozen or so visits a day isn't a serious attack, which is why I think they could be reconnaissance visits. The botnet has to be built up to a large size, tens of thousands of infected machines, before an effective DDOS attack can occur. So the operator of the botnet would wait until then before waking up the trojans and unleashing them in an all-out attack.

aristotle

7:06 pm on Oct 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a question... does this:
deny from 173.208.0.0/16
take longer than this:
deny from 173.208.64.0/22

Using compressed notation shortens the overall file size, so the more compression the better.

keyplyr

7:48 pm on Oct 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a question... does this:
deny from 173.208.0.0/16
take longer than this:
deny from 173.208.64.0/22

@ ougwilson - no, no difference whatsoever.

@ aristotle - not to say that article did not have valid points, but once again, I am commenting on experience. I started a thread on this a month or two ago where I outlined all this. I do not build my defenses on what others suppose *might* happen, I set-up my security by what *does* and what *has* happened at my sites and it works very well.

lucy24

8:04 pm on Oct 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The real question is, does
Deny from 173.208.0.0/16
take more, less or identical time compared to
Deny from 173.208
where the latter uses 7 few bytes in filesize.

:: business with calculator ::

146 lines like that (with or without initial "Deny from" doesn't matter), and you've shaved 1K off your htaccess.

In the case of /16 vs. /22: Is the server really savvy enough to stop counting at 16? Or does it check all the way to the end, and only then count how many bits matched?

dougwilson

8:16 pm on Oct 14, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



"no difference whatsoever"

okay, thanks, made no difference between me and the server... but didn't know overall

"Is the server really savvy"

I thought the server read everything in htaccesss and delivered according to all directives

just wondered about adding up split seconds

keyplyr

9:05 pm on Oct 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ dougwilson - If the IP of the request matches the deny rule you have in the htaccess it matters not whether the denied range is small or wide, it will still be a match and the order will be obeyed, so there is no difference in the time it takes to carry out this directive.

Now, there are other server directives that, depending on how they are written and in what order, can affect the response time (a little.) For that discussion, you should probably post your questions in the Apache Web Server forum.

lucy24

9:57 am on Oct 15, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought the server read everything in htaccesss

Everything in the htaccess, sure. Although come to think of it: does it read the entire htaccess before processing a request? How exasperating to have to plow through 500 lines of China when the visitor has already been blocked on the grounds of being a Ukrainian botnet.

I was wondering about something slightly different: reading the request itself. In the case of
173.208.0.0/16 or 173.208
the server only has to match the first 16 bits of the binary IP, while in
173.208.64.0/22
it has to match 22 bits. So the question is: at the point when it's executing this specific line, does it read the entire IP, or only the first 16 or 22 bits?

Obviously at this point we really are getting into nanoseconds.

wilderness

10:56 am on Oct 15, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Obviously at this point we really are getting into nanoseconds.


And it only makes a difference (slow) if your server and/or site is already slowed down with either a very large database or corrupted database within MySQL & PHP.

Angonasec

11:52 am on Oct 15, 2014 (gmt 0)



Poor Mr. Wilson's thread has been Lucid into an interesting cul-de-sac.

Subjectively, I can tell when our sites' htaccess files have tipped over 20K in bulk, simply by testing on Y!Slow (I don't use GPagespeed anymore.)

Already, at 22k I'm looking to pare it down.

So, fellow logheads, why not answer your own queries by setting up tests, and checking the result on Y!Slow.

I suspect your assumptions will be challenged.

dougwilson

3:14 pm on Oct 15, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



"post your questions in the Apache Web Server forum"

Okay... I did learn from asking the question... at least what people who have been doing this longer than I have think about it

So I thank you for your answers and observations

keyplyr

7:06 pm on Oct 15, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month





Subjectively, I can tell when our sites' htaccess files have tipped over 20K in bulk

Several of mine are over 80k and one just passed the 100k. Google Pagespeed says they are fast.