Forum Moderators: phranque

Message Too Old, No Replies

Banning from httpd.conf

Sample <directory> for perusal.

         

Etruscan

6:49 pm on Nov 8, 2005 (gmt 0)

10+ Year Member


So I started playing with .htaccess last night, in an effort to keep out some of the nasty bots I've been getting lately. However, since I run the server and have complete access to the httpd.conf file, I thought it would likely be better to contain my bot and ip bans in there. I understand this works better anyways.

What I would like to know is if the below will work. Any feedback would be helpful.

<Directory "c:/www/public_html/htdocs">

Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all

Deny from 111.111.111.111
Deny from 111.111.111.112
Deny from 111.111.111.113
Deny from 123.123.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RufusBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

</Directory>

JAB Creations

6:51 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't answer your question but what about .htaccess don't you feel will do the job?

Etruscan

6:56 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



...well, from what I understand, .htaccess should be reserved for use when you don't have access to httpd.conf. It can be slightly slower (as the server needs to look in all directories (and subdirectories) leading up to the requested page for the file and parse it, which it does on every request.

It's a good alternate solution for those without access to the conf file, but seems unnecessarily ineffecient given that I do have access.

Etruscan

7:26 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Also, I have several virtual hosts... but this <Directory> is outside of the <VirtualHost> containers in the httpd.conf. Will this still apply to all the virtual hosts (which fall under the <Directory> location)?

JAB Creations

7:30 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To my knowledge httpd.conf should effect all sites hosted on that apache box whereas .htaccess will only effect that individual site. Again I could be wrong...

If you're going to mod the httpd.conf in a way that may be questionable to some you may want to talk bout what changes you make with any clients also hosting with you. It would be nice to get confirmation or a correction from a third person though.

Etruscan

7:36 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Well, all subdomains are mine, so no worry there. All the document roots are covered by the <Directory>, but the <VirtualHost> containers themselves are not within the <Directory> container... they are outside it.

I'm not sure of the difference between putting them inside as opposed to out... but I need those Rewrites in <Directory> to effect all the virtual hosts... which I think it will (but want to be sure).

...and of course, the other reason I posted was that I want to make sure I haven't messed up the code for the <Directory> in the first place.

jdMorgan

7:37 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Putting the code in httpd.conf is better performance-wise. This is because the code in httpd.conf is compiled on server restart. The same code in .htaccess is interpreted for each HTTP request. So your server performance will be better if the code is in httpd.conf.

The only downside is that the server must be restarted (to re-compile the code) before any changes made to the code will take effect.

As to 'will this work?', we really don't do code reviews; demand would quickly outstrip the supply of volunteer contributors here. I'd suggest you test the code and if you have any problems, then we can review the code in light of the reported error.

You will need to be careful that the patterns you use for 'bad bots' are correct. Most in your list are start-anchored, meaning they will only match if the user-agent starts with the specified string. Often, these lists are collected by folks who don't understand regular-expressions; They see that most patterns are started with a "^" character, and incorrectly deduce that all must start with that anchor. So, be on the lookout for user-agents that seem to make it past your block list; it's likely that the patterns for those user-agents are incorrectly-anchored. On the other hand, it is good to anchor the patterns if possible; it makes them more specific and speeds up processing n-fold, where n is the length of the actual user-agent minus the length of the pattern in characters.

Jim

Etruscan

7:43 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Thanks Jim... I'll try the code out tonight. I don't see any glaring problems with it myself, but I'm fairly new to Apache so wanted to bounce it off the experts.

I may be back yet.