Forum Moderators: phranque

Message Too Old, No Replies

.htaccess dynamic ban agent question

Using php to write user agent bans to .htaccess

         

Duskrider

10:19 pm on Apr 30, 2007 (gmt 0)

10+ Year Member



Hey everyone,

I'm currently working on a bot-trap type of project incorporating a mySQL database and a type of admin panel which will allow the user (web programmer) to add or remove entries to the .htaccess file without having much knowledge of .htaccess.

I'm aware of the potential for disaster inside an .htaccess file, but I'm sure if the code is written correctly there shouldn't be any worries. Extensive testing for sure.

My question is about he user agent banning. Currently the bot trap will ban IP addresses (Deny from X) when the IP does something that makes it bad (ignoring robots.txt for example). That IP deny gets written to .htaccess and thrown into a database of banned IPs. What I would like to do is allow the user to also ban the User Agent related to that IP by selecting the log entry from a table. The program would then get the UA from the database, wrap the appropriate code around it, and put it in the .htaccess file at the proper location. The User Agent string is retrieved from the server as a string containing the entire UA.

There's no way, at least none that I know of, to whittle a UA down to the important bit (ie HTTrack) from the whole UA string via a program. It's something that a human needs to evaluate. That being the case, the only option I would have is to include the entire UA string in the .htaccess file like:

RewriteCond %{HTTP_USER_AGENT} ^The sometimes incredibly long user agent string goes here$

Is there anything wrong with that? Does it violate a rule anywhere... will it cause any problems... it is just generally a bad idea... or will it be ok?

Thanks for any responses!

jdMorgan

11:06 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to check the PHP and PERL forum libraries here at WebmasterWorld for previously-discussed programs and methods.

One thing that can really help keep things simple is to use the SetEnvIf directive to set a variable --commonly called "getout" in posts here-- and then test that variable later using a single "Deny from" or RewriteCond.

The advantage is that you simply prepend records to the .htaccess file, which save the trouble of having to read it in a line at a time, parse it, and find the right "instertion point." Remember, you're going to need to flock() the .htaccess file to prevent two or more concurrent threads from trying to 'edit' it simultaneously. If you don't flock the file, then the last thread to write to it 'wins' and the other threads' entries will be lost. So, simply prepending new records is both simple and fast, and requires the file to be locked for the shortest possible time.

Take a look at the various versions of key_master's bad-bot PERL script, and xlcus'/alexk's runaway 'bot PHP script -- they're sure to give yousome ideas...

As to the UA string, sure you can put the whole thing in there, just be sure to escape all 'special' regex characters or to put the whole string in quotes.

Jim

[edited by: jdMorgan at 11:41 pm (utc) on April 30, 2007]

Duskrider

1:04 am on May 1, 2007 (gmt 0)

10+ Year Member



Thanks for the response.

I'll check into that SetEnvIf information. I've got a version of a bot-trap that I'm using as a rough model, and I have the flock down and the insertion already working. In PHP my find-point-then-insert code is only one command long so it shouldn't be too bad as far as access time. I started out only adding lines to the end of .htaccess but it got pretty messy looking fast... so I wanted to try to keep it as clean as possible.

Thanks for the heads up though. If I didn't have my base bot-trap script to work from I would have never thought to lock the file.