Forum Moderators: coopster

Message Too Old, No Replies

Writing to .htaccess

The Simplest, Safest Method?

         

inuwolf

4:15 pm on Feb 23, 2006 (gmt 0)

10+ Year Member



I read "Blocking Badly Behaved Bots" and its predecessor, and I'd like to ask a relatively simple question:

Assuming I already have the bad IP to ban (let's call it $ip), how would I write the IP to the appropriate line in .htaccess, and safely?

I'd like to use a relatively simple script like this one:

<?php
$file = $_SERVER['DOCUMENT_ROOT'] .'/.htaccess';
$fp = fopen($file, 'a');
fwrite($fp, "Deny from".$ip."\n");
fclose($fp);
?>

But again, I'm not sure how safe this is, and it doesn't write the IP to the correct line. Any suggestions? I know the other anti-bot scripts do this, but it is very (needlessly?) complicated and I have not been able to block the IP in the right line. Please help out a newbie trying to defend his sites vulnerable forms! Thanks.

SeanW

4:22 pm on Feb 23, 2006 (gmt 0)

10+ Year Member




it doesn't write the IP to the correct line.

I'm not sure what you mean... Can you explain?

Sean

inuwolf

5:00 pm on Feb 23, 2006 (gmt 0)

10+ Year Member



Sorry,
I'd like to write each IP directly beneath the line in my .htaccess reading "order allow,deny". Using the trap script described in the Apache forum just added the IP to the very top of the .htaccess file.

jatar_k

6:23 pm on Feb 23, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



you would have to read through the file

open the file
pull it's contents into a var
find the spot where the new data needs to be written
insert the new data
rewrite the whole thing to your file

jdMorgan

7:05 pm on Feb 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can save the bother of parsing the existing .htaccess file to find the insertion point with a slight change of plans:

Instead of writing


fwrite($fp, "Deny from".$ip."\n");

in the PHP code, write

fwrite($fp, "SetEnvIf %{REMOTE_ADDR} \"^".$ip."$\"getout\n");

Then in the .htaccess code, use the single line

Deny from getout

In this way, you can simply prepend the new .htaccess record to the existing file, and on most servers, you could actually append the new record to the existing file. This works because SetEnvIf directives in .htaccess are usually processed before almost all other modules' directives. The determining factor is the LoadModule order on Apache 1.x, and the module priority scheme on Apache 2.x. Which method you can use should be fairly easy to test.
(Note that on Apache 1.x, modules are processed in the reverse order of their appearance in the LoadModule list -- the first-loaded module executes last.)

Eliminating the necessity of parsing through a lot of existing .htaccess records to find the insertion point should result in a measurable perfomance improvement on the 'insert new record' function, and simplify the PHP and "manual" .htaccess coding.

Jim

coopster

7:29 pm on Feb 23, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Neat idea, jd.

jdMorgan

7:35 pm on Feb 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only reason it's important to get this done fast is so that the very next HTTP request --if not already in progress-- can be blocked; It's important (in the context of blocking content access) to write the record as fast as possible. Even with the worst-case of having to parse the entire .htaccess file to find the insertion point, overall server perfomance is unlkiely to be affected, since the code only runs when your site is "under attack" from a bad bot -- and then only for the first request (or first few simultaneous requests) after the detection logic decides to block the bot.

Jim

inuwolf

10:45 pm on Feb 23, 2006 (gmt 0)

10+ Year Member



Jim, as simple as this seems, it's still not working for me. I've tried putting "deny from getout" everywhere in my .htaccess file, starting with the obvious, but it refuses to work. I've also tried chmoding my .htaccess file but then I get a 500 error. Any idea about what I might be doing wrong?

jdMorgan

1:34 am on Feb 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps an example of the .htaccess code will help:

# The following lines are written by the bad-bot scripts
SetEnvIf Remote_Addr ^66\.249\.***.101$ getout
SetEnvIf Remote_Addr ^211\.231\.**\.13$ getout
#
# ...other config directives...
#
# Block bad-bots using lines written above by bad bot script, but
# always allow robots.txt and 403.html error page to be fetched
SetEnvIf Request_URI "(403\.html¦robots\.txt)$" allow
<Files *>
Order Deny,Allow
Deny from env=getout
Allow from env=allow
</Files>

Change all broken pipe "¦" characters in code on WebmasterWorld to solid pipes before use; Posting here modifies that character.

When you get a server error, look at your server error log file. It will often tell you exactly what is wrong.

Note that my script escapes the literal periods in the $ip variable. Octets reading *** were intentionally-obscured to comply with the WebmasterWorld TOS.

Jim

inuwolf

2:54 am on Feb 24, 2006 (gmt 0)

10+ Year Member



thanks, it did. got it working :).

just a couple unimportant questions to ask. I have a custom 403 error page, but ironically the banned can't see it--they're forbidden from the forbidden page! I can still send them the message I want by dropping the .html code for the 403 page right into the .htaccess, but it would be cleaner if I could just let the banned see the banned page. any way to do this?

also, if the banned go to example.com, it comes up with the "Red Hat Enterprise Linux Test Page", when every other page, even example.com/index.html, comes up with the normal 403 page. why does this happen / how can I prevent it?

also, i've prevented my getout.php page from being spidered through robots.txt. is there any way a good crawler could still find it, even though i haven't even put in a link to it yet?

maybe i'm just overthinking things! anyway, thanks for helping me accomplish my main objective!

jdMorgan

4:00 am on Feb 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a custom 403 error page, but ironically the banned can't see it--they're forbidden from the forbidden page!

Did you modify this line


SetEnvIf Request_URI "(403\.html¦robots\.txt)$" allow

to match your custom 403 page's name?

Did you note this?

Change all broken pipe "¦" characters in code on WebmasterWorld to solid pipes before use; Posting here modifies that character.

No idea why you'd see the default server page, unless it's just a by-product of this first problem.

Jim

inuwolf

5:18 am on Feb 24, 2006 (gmt 0)

10+ Year Member



thanks, I changed it to allow the error file. I hadn't understood that part.

as for the server page, it has some weird instructions on it referring to a nonexistent file (welcome.conf) in a directory that doesn't exist (/etc/httpd/conf.d/) or at least isn't accessible by ftp; looks like I'll have to talk to my host about that one. fortunately it's not too pressing.

there's a minor problem in the script that should be corrected lest newbies like myself get hung up on them:

as code appears:

fwrite($fp, "SetEnvIf %{REMOTE_ADDR} \"^".$ip."$\"getout\n");

as code should be:

fwrite($fp, "SetEnvIf Remote_Addr ^{$REMOTE_ADDR} $ getout\n");

the former literally writes "%{REMOTE_ADDR}" to .htaccess.

EDIT: The script still does not work. With both versions, it bans ALL IPs! There must still be a problem with my .htaccess file, and I've written it exactly as your example, Jim...

jdMorgan

6:20 am on Feb 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> the former literally writes "%{REMOTE_ADDR}" to .htaccess.

Got that syntax confused with mod_rewrite's.

Jim

inuwolf

6:29 am on Feb 24, 2006 (gmt 0)

10+ Year Member



OK fixed.