Forum Moderators: coopster

Message Too Old, No Replies

Install Routine for Badly-Behaved-Bot Blocker

This post includes a request for help with part of the install routine

         

AlexK

6:28 pm on Aug 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've written a simple setup + install script for the bot-blocker routine. It will be of particular interest to those with little or no PHP experience. There is a part of the script that I wanted to include but did not, because I could not get it to work, so this post includes a request for help from more experienced coders at the bottom.

My thanks to docbird for inspiring the install routine. He sent me a sticky back on April 28, 2007 containing some very practical suggestions. It is the very longest that I've ever taken to answer a sticky! but I trust that he will agree that this is worth it.

Next, the request for help. The install routine customises the script to be able to work on the server where it is run, then at the very end (after the script is installed) tests that it works.

The following works fine:

<?php 
// test the bot_blocker file
echo "Testing `$bot_blocker'...<br />\n";
if( file_exists( $bot_blocker )) {
if( file_exists( $ipFile )) {
echo "<b class='oops'>(sorry, `$ipFile' already exists &amp; cannot test)</b><br />\n";
} else {
echo "Using one include() iteration...<br />\n";
include( $bot_blocker );
if( file_exists( $ipFile )) {
echo "<b class='ok'>Success!: tracking-file `$ipFile' created.</b><br /><br />\n";
} else {
echo "<b class='oops'>Fatal Error: tracking-file `$ipFile' not created.</b><br />\n";
}
}
} else {
echo "<b class='oops'>(sorry, prog-write failed &amp; cannot test)</b><br />\n";
}
?>

This next does not work:
for( $i = 1; $i <= 10; $i++ ) { 
include( $bot_blocker );
}

At least one iteration works, since
$ipFile
's atime is increased by one, but the rest of the iterations do not. I'm sure that I am missing something very obvious here. Can anybody see it?

[edited by: coopster at 7:12 pm (utc) on Aug. 13, 2008]

minibear

2:12 pm on Aug 13, 2008 (gmt 0)

10+ Year Member



hi,Alex Kemp
I am a newbie and no good Englisch.
this bot block script is super good.
but I habe a problem mit setup script.

error:

Performing atime-disabled check...

Whoops! Cannot create the tracking file.
$ipFile = `308' ($ipLength=`3').
$startTime = `1218633744'.
$hitsTime = `1218633744'.
Whilst trying to create one of the zero-byte tracking files, that action threw an error. This install script will not be able to function.

This is typically a permissions' issue, and indicates that the web-server-user may have read-only permission to the directory. That is a VERY good security option, but will prevent this install script from being able to go any further. On *nix, the directory will likely need to be at least chmod 0775.

my system is SuSE 10

chmod 0775 some_user
chown apache.apache some_user

my script location:'/home/botchecken/bot_blocker.php'
I have run
chmod 0775 /home/botchecken/
chown -R wwwrun.www /home/botchecken/

som_user:'/home/botchecken/'

less /etc/passwd
wwwrun:_:_:_:WWW daemon apache:/var/lib/wwwrun:/bin/false

drwxrwxrwx 4 wwwrun www 4096 13. Aug 05:45 botchecken
drwxrwxrwx 2 wwwrun www 4096 13. Aug 05:08 block
-rw-r--r-- 1 wwwrun www 5175 13. Aug 05:47 bot_blocker.php
drwxrwxrwx 2 wwwrun www 4096 13. Aug 04:52 logs

I have no idea what's wrong with permision structure.
help me!

AlexK

4:12 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello minibear, Welcome to WebmasterWorld!

I've already replied to you on my own forums, but here is a little extra:

The Apache User & Group are set in `httpd.conf' - they tend to be `apache' or `nobody'. I suspect from your `/etc/passwd' extract that they are the latter. That will mean that Apache has rights to neither the owner nor the group-owner of your webspace. That will mean that you will have to set the permissions at:

chmod 0777 /home/botchecken

...and that should give setup.php permission to write a zero-byte file into the root of the webspace. It may also be necessary to change permission on the script:

chmod 0777 /home/botchecken/setup.php

...but I haven't checked that.

PS
Your English is superior to my German!

[edited by: AlexK at 4:15 pm (utc) on Aug. 13, 2008]

coopster

4:26 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I am guessing it has to do with stat cache. And on some filesystems, the atime won't be updated at all. I'm not certain if this is the issue, but it is worth investigation. Details on fileatime [php.net]

AlexK

4:37 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Managed to fix this for myself.

The following gets stopped mid-loop, proving that the routine works:

for( $i = 2; $i < 30; $i++ ) { 
echo "Include #$i... ";
clearstatcache();
include( $bot_blocker );
$startTime = filemtime( $ipFile );
$hitsTime = fileatime( $ipFile );
echo "mtime = $startTime; atime = $hitsTime.<br />\n";
}

I swear that I tried
clearstatcache()
before. Well, whatever - it works now.

Here is a run that I did just now on my home server:

Finally... Program Test

Testing `bot_blocker.php'...
Using one include() iteration...
Success!: tracking-file `example.co.uk/www/block/07c' created.

Now multiple iterations...
(the following section of code will throw warnings due to `headers already sent'... sorry, no way to avoid this)

Include #2... mtime = 1218644453; atime = 1218644453.
Include #3... mtime = 1218644453; atime = 1218644454.
Include #4... mtime = 1218644453; atime = 1218644455.
Include #5... mtime = 1218644453; atime = 1218644456.
Include #6... mtime = 1218644453; atime = 1218644457.
Include #7... mtime = 1218644453; atime = 1218644458.
Include #8... mtime = 1218644453; atime = 1218644459.
Include #9... mtime = 1218644453; atime = 1218644460.
Include #10... mtime = 1218644453; atime = 1218644461.
Include #11... mtime = 1218644453; atime = 1218644462.
Include #12... mtime = 1218644453; atime = 1218644463.
Include #13... mtime = 1218644453; atime = 1218644464.
Include #14... mtime = 1218644453; atime = 1218644465.
Include #15...

Server under heavy load
You are scraping this site too quickly. Please wait at least 121 secs before retrying.

It's quite bizarre what a kick I get from seeing the block happening.

The changed file is due to be uplifted shortly. It is currently called `bot_blocker.7z'.

[edited by: coopster at 7:14 pm (utc) on Aug. 13, 2008]
[edit reason] exemplified [/edit]

AlexK

4:48 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



coopster:
on some filesystems, the atime won't be updated at all

The setup-routine contains a section to check that on the server. If you look at minibear's post, you will see that that was the part where the setup check-routines stopped the script. The setup.php script was unable to check whether the filesystem was atime-disabled, since it was unable to save the check-file.

PS
Will you edit the top-link in my first post please, coopster? It currently ends up using a double `pubcon.com' redirect and ends up at the WebmasterWorld home page, which is not exactly correct.

coopster

7:18 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Fixed the double redirect.

Yeah, I figured you knew about atime and that your code was indeed checking it. I threw the note in there mostly for future readers that may not be aware of subtle differences with stat cache and various operating systems.

AlexK

9:21 pm on Aug 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for that, coopster, and well done for dropping on the problem-fix in one go.

Anyone have suggestions for avoiding the 0755 directory-permissions problem? It is certain to have to be like that for setup.php to be able to operate, but needs to be different afterwards.

As a specific question, what is the best way to universally prevent web-access to a specific directory (remember, not knowing the server OS nor the web-server software)? Does it exist?

minibear

12:07 am on Aug 14, 2008 (gmt 0)

10+ Year Member



hi,Alex Kemp
thank you

first try with new setup script on Aug 13,2008

the zero-byte tracking files is not created.
no bot_blocker.*** created.

la -al /home/botchecken/
drwxr-xr-x 4 root root 4096 13. Aug 04:46 .
drwxr-xr-x 20 root root 4096 21. Jul 17:06 ..
drwxrwxrwx 2 root root 4096 14. Aug 00:54 block
-rwxrwxrwx 1 root root 8714 13. Aug 05:07 bot_blocker.php
drwxrwxrwx 2 root root 4096 13. Aug 04:52 logs

ls -al /srv/www/htdocs/setup.php
-rwxrwxrwx 1 root root 20701 14. Aug 00:48 /srv/www/htdocs/setup.php

apache error_log:
PHP Warning: touch(): Unable to create file 308 because Permission denied in /srv/www/htdocs/setup.php on line 128

second try directly to setup all bots script on Aug 07,2008

the zero-byte tracking file 27e is created.
but no bot_blocker.*** created.


ls -al /home/botchecken/block/
drwxrwxrwx 2 root root 4096 14. Aug 00:54 .
drwxrwxrwx 4 root root 4096 14. Aug 00:50 ..
-rw-r--r-- 1 wwwrun www 0 14. Aug 00:54 27e

.htaccess
<IfModule mod_php5.c>
php_value auto_prepend_file "/home/botchecken/bot_blocker.php"
</IfModule>

apache error_log:
[error][client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45
[error] [client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45, referer: http:mywebsite.com/

test all bots script


<?php
// test the bot_blocker file
$bot_blocker='/home/botchecken/bot_blocker.php';
echo "Testing `$bot_blocker'...<br />\n";
if( file_exists( $bot_blocker )) {
if( file_exists( $ipFile )) {
echo "<b class='oops'>(sorry, `$ipFile' already exists &amp; cannot test)</b><br />\n";
} else {
echo "Using one include() iteration...<br />\n";
include( $bot_blocker );
if( file_exists( $ipFile )) {
echo "<b class='ok'>Success!: tracking-file `$ipFile' created.</b><br /><br />\n";
} else {
echo "<b class='oops'>Fatal Error: tracking-file `$ipFile' not created.</b><br />\n";
}
}
} else {
echo "<b class='oops'>(sorry, prog-write failed &amp; cannot test)</b><br />\n";
}
?>

Info:
Testing `/home/botchecken/bot_blocker.php'...
(sorry, `/home/botchecken/block/27e' already exists & cannot test

remove code in .htaccess


<IfModule mod_php5.c>
php_value auto_prepend_file "/home/botchecken/bot_blocker.php"
</IfModule>

test again

Info:
Testing `/home/botchecken/bot_blocker.php'...
Using one include() iteration...
Success!: tracking-file `/home/botchecken/block/27e' created

[edited by: minibear at 1:03 am (utc) on Aug. 14, 2008]

g1smd

12:51 am on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The only way to stop web access to a folder is to make sure that it is "above root".

Otherwise every server type is different, and each have multiple ways to achieve it. In Apache you can "deny" access in httpd.conf or from .htaccess or you can rewrite external (HTTP) URL requests to a path that does not exist, so generating a 404 error to the user.

AlexK

10:59 am on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi minibear, thanks for the response.

As I thought, a permissions issue. Easily fixed.

It seems that you are creating

bot_checker.php
in one directory, then checking it in another.

`/home/botchecken' is

`0755 root.root'
. Apache will have read-only permission to that directory. It also looks likely that Apache is not setup for that directory. You do not give the permission structure for
/srv/www/htdocs
, so I cannot comment on that one.

The fix is exactly as in my earlier post. If web-accessible, do not forget to change the permissions back afterwards (not block/ & log/).

AlexK

11:08 am on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for your input, g1smd.

Probably the best thing is to create a single directory from root of the web-space, and place everything within that. There will be little problem with folks accessing

bot_blocker.php
directly, in any case. The only result that they can have is to block themselves! My main desire was to prevent direct access to the tracking files. I guess that folks should use the most easily-accessible mechanism of their web-server to do that, and the setup script should leave them to do that themselves.

minibear

12:16 am on Aug 15, 2008 (gmt 0)

10+ Year Member



Thanks AlexK

apache error_log:
[error][client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45
[error] [client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45, referer: http:mywebsite.com/

I have finally only this Problem :).

line 45
$ipRemote = ${$_SERVER_ARRAY}['REMOTE_ADDR'];

AlexK

1:36 pm on Aug 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(for other readers, minibear got the problem above fixed elsewhere [webmasterworld.com])

Version 0.11 released New! Improved!

Taking on the lessons of this thread, the setup & install script has been re-constituted. It now allows the root-directory for the bot_blocker script to be any (pre-existing) directory that the web-server has permission to write to. All script and sub-directory names can be customised.

This now means that the bot-blocker can be installed "above the root", even via http. It also guarantees that the bot-blocker script and all associated files will be within a single directory, which makes maintenance far simpler.

You can find the download file via the library link (first post on this thread, and first post on the library thread). The file to look for is "bot_blocker_v0.11.7z". You do not actually have to download the file; all text files can be viewed anonymously from the download link, and every file within the archive is a text file. Downloading the archive requires an FOC login.

As a final comment on the setup script, the actual block-script installed is identical to the routine issued across all these years (it works perfectly, so why change it?). It is simply customised to the server on which it is installed.

A final, final comment on 'installing "above the root"': many hosting companies provide web-space where the user has FTP access to the web-root directory & below, but no higher. You will do well to avoid such a brain-dead setup, and search out a host that also provides FTP (or other) access to a directory 'above the root'. That directory is the place to put all files that you do not want the public to have web-access to. It is the simplest means to provide an extra layer of security for your site.

g1smd

12:12 am on Aug 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the bad bot script itself, I changed the three HTML Error Messages to:

echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under undue load</b></p>";
echo "<p>$visits visits from your IP-Address within the last ". (( int ) ( $duration / 3600 )) ." hours.</p><p>Please wait ". (( int ) ( $wait / 3600 )) ." hours before retrying.</p></body></html>";

echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under heavy load</b></p>";
echo "<p>You are scraping this site too quickly.</p><p>Please wait at least $wait secs before retrying.</p></body></html>";

echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under heavy load</b></p>";
echo "<p>You are scraping this site too quickly.</p><p>Please wait at least $wait secs before retrying.</p></body></html>";

so that each one validated and used correct coding.

AlexK

11:29 pm on Aug 31, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bot-Blocker updated again to take g1smd's suggestions into account, plus a missing section added to the ReadMe on dealing with server-imposed backup. The file to look for now is "bot_blocker_v0.11.1.7z".

Thanks for your input, g1smd. Using the breakfast-rating-method (for commitment), that makes you 'pig' (as opposed to 'hen') where HTML coding validation is concerned, huh?

g1smd

12:00 am on Sep 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bacon as opposed to egg?

Not sure if I am geek enough to understand the question...

AlexK

4:18 am on Sep 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The breakfast analogy for commitment:

Bacon 'n' eggs is a good analogy for illustrating the different natures of commitment. Some people are 'Hens'; definitely committed, since they are donating their children. But they can always lay another egg. Other people are like 'Pigs'... fully committed to providing breakfast.