Welcome to WebmasterWorld Guest from 54.81.105.205

Forum Moderators: phranque

Message Too Old, No Replies

How to centralize administration of things to block

An alternative to "A close to perfect .haccess ban list"

     
10:44 pm on Oct 12, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


How to centralize administration of things to block

If you have root access there is a solution using mod_rewrite which allows you to keep a single external file for UAs, IPs and Referrers to block.

How it works

mod_rewrites allows you to use hash tables to insert/substitute fields in RewriteCond and RewriteRules through a key lookup. These tables can be stored in ASCII files or NDBM files. This is the feature that we want: External files.

To block certain UAs, IP addresses or Referrers we try to look them up in the hash table. If they are contained in the table, we block this request. Otherwise we allow it.

Since this approach uses a key lookup we cannot use regular expressions here. So we need to include each and every UA, IP, URL we want to block in the tables. Additionally keys cannot contain spaces.

For the UA string we need to strip those off before passing it to the lookup function. Since we have only 9 slots for backreferences available, we can only differentiate between UA strings up to the 9th space contained in the UA string. In the implementation I choose to take only the first 4 words contained in the UA string.

For the aforementioned reasons one needs to decide whether one wants to block specific URLs or whole domains.

Generally speaking to account for the lack of regular expressions in the look up we need to strip off unwanted information ("http://" and the path for the referrer and spaces for UAs) before doing the key lookup. This stripping off is done by a RewriteCond rule which preceeds the one doing the actual look up.

Despite those short comings and the extra work involved, this approach is still slightly faster than the ordinary approach used in the A Close to perfect .htaccess ban list [webmasterworld.com] thread. And it adds the additinal benefit of having to maintain only one set of data.

Implementation

  • In your httpd.conf add the following lines:

    RewriteMap testmap txt:/var/www/html/testmap 
    RewriteMap iptestmap txt:/var/www/html/testmap
    RewriteMap reftestmap txt:/var/www/html/testmap
    or
    RewriteMap hashmap dbm:/var/www/html/hashmap 
    RewriteMap iphashmap dbm:/var/www/html/hashmap
    RewriteMap refhashmap dbm:/var/www/html/hashmap

  • In each .htaccess stick the following lines:

    RewriteCond %{HTTP_USER_AGENT} ^([^\ ]+)\ *([^\ ]*)\ *([^\ ]*)\ *([^\ ]*) 
    RewriteCond ${hashmap:%1%2%3%4} ^b$
    RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L]
    #
    RewriteCond ${iphashmap:%{REMOTE_ADDR}} ^b$
    RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L]
    #
    RewriteCond %{HTTP_REFERER} ^http://([^/]+)
    RewriteCond ${refhashmap:%1} ^b$
    RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L]

    For the referrer we use the domain only. If you want to use more than the domain add additional parenthesis to the RewriteCond %{HTTP_REFERER} ^http://([^/]+) rule. Don´t forget to add the appropriate backreferences to the rule doing the lookup.

  • Create a map file like this:

    BlackWidow b 
    Botmailto:craftbot@yahoo.com b
    ChinaClaw b
    DISCo b
    DownloadDemon b
    ...
    Wget b
    Widow b
    Xaldon b
    WebSpider b
    Zeus b

    Remove spaces from user agent names.

  • For the iphashmap create a file like this:

    218.211.127.156 b 
    154.216.125.110 b

  • For the refhashmap create a file like this:

    www.webmasterworld.com b 
    www.aaroncarter.com b

    Convert each of the above files to the NDBM format as explained in the Hash File section of the RewriteMap [httpd.apache.org] documentation.

    The b is just some arbitrary string (blocked) that is returned by the key lookup and for which we check in the RewriteCond rule.

Benchmark Results

The setup is the same as here [webmasterworld.com]

  • .htaccess, single RewriteCond
    AaronCarter needed 1.58516982468692 seconds. 
    BlackWidow. needed 1.68941572579471 seconds.
    Zeus....... needed 1.59178618951277 seconds.

  • .htaccess, txt mapfile
    AaronCarter needed 1.54459900205786 seconds. 
    BlackWidow. needed 1.57292873209173 seconds.
    Zeus....... needed 1.5213089206002 seconds.

  • .htaccess, dbm mapfile
    AaronCarter needed 1.52674262090163 seconds. 
    BlackWidow. needed 1.56979447061365 seconds.
    Zeus....... needed 1.52950318293138 seconds.

  • Difference
    AaronCarter: 0.05842720378529 -> not blocked 
    BlackWidow.: 0.11962125518106 -> blocked
    Zeus.......: 0.06228300658139 -> blocked

Again, using the dbm map file is between 0.05 - 0.1 ms faster per request than using the single RewriteCond rule. It would be interesting to know how this solution scales compared to the normal RewriteCond method. While I didn´t test this, I would expect the dbm mapfile to be a lot faster for lots of entries.

Of course, if you have root access it would have been smarter to include the rules in the httpd.conf instead of the .htaccess file. This would speed up things slightly, too. As a general rule, avoid .htaccess files at all if you can. I realize most can´t.

Pros/Cons

pros

- single file for all (virtual) servers / all your projects
- slightly faster than other method
- no mod_perl needed

cons

- no regular expressions available
- no spaces in map keys allowed
- no case insensitive matching
- root access needed

See also

Module mod_rewrite URL Rewriting Engine [httpd.apache.org]
A Close to perfect .htaccess ban list [webmasterworld.com]
How (and Where) best to control access [webmasterworld.com]

Hope this is useful. Comments welcome.

Andreas

3:58 pm on Oct 14, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jeremy_goodrich is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 4, 2000
posts:3468
votes: 0


Wow, that's slick. It looks like a good way to go.

Being proactive about banning more oddly named useragents that don't tell you what they are doing on your site is next on my personal 'todo list' :) Thanks for posting this.

1:48 pm on Oct 16, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


If you don´t care much for the external file but want to centralize administration of RewriteRules for all you virtual servers anyway declare the blocking rules in your main server config section and set RewriteOptions [httpd.apache.org] to inherit.

Andreas

6:58 pm on Oct 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 1, 2002
posts:774
votes: 0


Andreas:

Great method! And great tutorial, thanks for the write up.

If I may, can I propose an alternate?

My method has some advantages over yours (you can have spaces, match regular expressions, can match over 9 places), but also some possible dissadvantages (MUST have mod_perl). It shares the same ideas of one, centeralized list for all servers, and I beleive is as fast or faster than similar .htaccess (but I have no way of testing, or at least no knowledge of how to test!)

I use the Apache mod_perl BlockAgent, slightly modified. I used that as a base, and rewrote it to block IP's (and I beleive you helped with aht, Andreas- thanks!)

Then, in each VH in httpd.conf, call thus:

<Location />
PerlAccessHandler Apache::BlockAgent
PerlSetVar BlockAgentFile /path/to/bad_agents.txt
PerlAccessHandler Apache::BlockIP
PerlSetVar BlockIPFile /path/to/bad_ip.txt
</Location>

Sample of BadAgents.txt:

^\$botname
# ALL-UPPERCASE letters, and nothing but those letters, 6 or more in a row
^([A-L]¦[N-Z])[B-Z][A-Z]{4,}$
# Begin with Date
^(Mon¦Tue¦Wed¦Thu¦Fri¦Sat¦Sun)\ (Jan¦Feb¦Mar¦Apr¦May¦Jun¦Jul¦Aug¦Sep¦Oct¦Nov¦Dec)\ [0-3][0-9]\ [0-2][0-9]\:[0-5][0-9]\:[0-5][0-9]\ ..T\ 2[0-9][0-9][0-9]Mozilla\/[0-9]\.[0-9][0-9]
almaden
^Anarchie
^ASPSeek
AspTear
^atSpider
^attach

Sample of BadIP.txt:

^63\.174\.33\.196$
# ICaughtYou.Com
^63\.144\.231\.
# Cyveillance
^63\.148\.99\.2(2[4-9]¦[34][0-9]¦5[0-5])$

NOTE: Files MUST be uploaded ASCII or it will not work.

Used in combination with a spider trap that writes to BadIP.txt rather than .htaccess, this is a great system!

dave

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members