Forum Moderators: coopster
I've written a simple setup + install script for the bot-blocker routine. It will be of particular interest to those with little or no PHP experience. There is a part of the script that I wanted to include but did not, because I could not get it to work, so this post includes a request for help from more experienced coders at the bottom.
My thanks to docbird for inspiring the install routine. He sent me a sticky back on April 28, 2007 containing some very practical suggestions. It is the very longest that I've ever taken to answer a sticky! but I trust that he will agree that this is worth it.
Next, the request for help. The install routine customises the script to be able to work on the server where it is run, then at the very end (after the script is installed) tests that it works.
The following works fine:
<?php
// test the bot_blocker file
echo "Testing `$bot_blocker'...<br />\n";
if( file_exists( $bot_blocker )) {
if( file_exists( $ipFile )) {
echo "<b class='oops'>(sorry, `$ipFile' already exists & cannot test)</b><br />\n";
} else {
echo "Using one include() iteration...<br />\n";
include( $bot_blocker );
if( file_exists( $ipFile )) {
echo "<b class='ok'>Success!: tracking-file `$ipFile' created.</b><br /><br />\n";
} else {
echo "<b class='oops'>Fatal Error: tracking-file `$ipFile' not created.</b><br />\n";
}
}
} else {
echo "<b class='oops'>(sorry, prog-write failed & cannot test)</b><br />\n";
}
?>
for( $i = 1; $i <= 10; $i++ ) {
include( $bot_blocker );
} $ipFile's atime is increased by one, but the rest of the iterations do not. I'm sure that I am missing something very obvious here. Can anybody see it?
[edited by: coopster at 7:12 pm (utc) on Aug. 13, 2008]
error:
Performing atime-disabled check...Whoops! Cannot create the tracking file.
$ipFile = `308' ($ipLength=`3').
$startTime = `1218633744'.
$hitsTime = `1218633744'.
Whilst trying to create one of the zero-byte tracking files, that action threw an error. This install script will not be able to function.This is typically a permissions' issue, and indicates that the web-server-user may have read-only permission to the directory. That is a VERY good security option, but will prevent this install script from being able to go any further. On *nix, the directory will likely need to be at least chmod 0775.
my system is SuSE 10
chmod 0775 some_user
chown apache.apache some_user
my script location:'/home/botchecken/bot_blocker.php'
I have run
chmod 0775 /home/botchecken/
chown -R wwwrun.www /home/botchecken/
som_user:'/home/botchecken/'
less /etc/passwd
wwwrun:_:_:_:WWW daemon apache:/var/lib/wwwrun:/bin/false
drwxrwxrwx 4 wwwrun www 4096 13. Aug 05:45 botchecken
drwxrwxrwx 2 wwwrun www 4096 13. Aug 05:08 block
-rw-r--r-- 1 wwwrun www 5175 13. Aug 05:47 bot_blocker.php
drwxrwxrwx 2 wwwrun www 4096 13. Aug 04:52 logs
I have no idea what's wrong with permision structure.
help me!
I've already replied to you on my own forums, but here is a little extra:
The Apache User & Group are set in `httpd.conf' - they tend to be `apache' or `nobody'. I suspect from your `/etc/passwd' extract that they are the latter. That will mean that Apache has rights to neither the owner nor the group-owner of your webspace. That will mean that you will have to set the permissions at:
chmod 0777 /home/botchecken
chmod 0777 /home/botchecken/setup.php
PS
Your English is superior to my German!
[edited by: AlexK at 4:15 pm (utc) on Aug. 13, 2008]
The following gets stopped mid-loop, proving that the routine works:
for( $i = 2; $i < 30; $i++ ) {
echo "Include #$i... ";
clearstatcache();
include( $bot_blocker );
$startTime = filemtime( $ipFile );
$hitsTime = fileatime( $ipFile );
echo "mtime = $startTime; atime = $hitsTime.<br />\n";
} clearstatcache()before. Well, whatever - it works now.
Here is a run that I did just now on my home server:
Finally... Program TestTesting `bot_blocker.php'...
Using one include() iteration...
Success!: tracking-file `example.co.uk/www/block/07c' created.Now multiple iterations...
(the following section of code will throw warnings due to `headers already sent'... sorry, no way to avoid this)Include #2... mtime = 1218644453; atime = 1218644453.
Include #3... mtime = 1218644453; atime = 1218644454.
Include #4... mtime = 1218644453; atime = 1218644455.
Include #5... mtime = 1218644453; atime = 1218644456.
Include #6... mtime = 1218644453; atime = 1218644457.
Include #7... mtime = 1218644453; atime = 1218644458.
Include #8... mtime = 1218644453; atime = 1218644459.
Include #9... mtime = 1218644453; atime = 1218644460.
Include #10... mtime = 1218644453; atime = 1218644461.
Include #11... mtime = 1218644453; atime = 1218644462.
Include #12... mtime = 1218644453; atime = 1218644463.
Include #13... mtime = 1218644453; atime = 1218644464.
Include #14... mtime = 1218644453; atime = 1218644465.
Include #15...Server under heavy load
You are scraping this site too quickly. Please wait at least 121 secs before retrying.
It's quite bizarre what a kick I get from seeing the block happening.
The changed file is due to be uplifted shortly. It is currently called `bot_blocker.7z'.
[edited by: coopster at 7:14 pm (utc) on Aug. 13, 2008]
[edit reason] exemplified [/edit]
on some filesystems, the atime won't be updated at all
PS
Will you edit the top-link in my first post please, coopster? It currently ends up using a double `pubcon.com' redirect and ends up at the WebmasterWorld home page, which is not exactly correct.
Anyone have suggestions for avoiding the 0755 directory-permissions problem? It is certain to have to be like that for setup.php to be able to operate, but needs to be different afterwards.
As a specific question, what is the best way to universally prevent web-access to a specific directory (remember, not knowing the server OS nor the web-server software)? Does it exist?
first try with new setup script on Aug 13,2008
the zero-byte tracking files is not created.
no bot_blocker.*** created.
la -al /home/botchecken/
drwxr-xr-x 4 root root 4096 13. Aug 04:46 .
drwxr-xr-x 20 root root 4096 21. Jul 17:06 ..
drwxrwxrwx 2 root root 4096 14. Aug 00:54 block
-rwxrwxrwx 1 root root 8714 13. Aug 05:07 bot_blocker.php
drwxrwxrwx 2 root root 4096 13. Aug 04:52 logsls -al /srv/www/htdocs/setup.php
-rwxrwxrwx 1 root root 20701 14. Aug 00:48 /srv/www/htdocs/setup.phpapache error_log:
PHP Warning: touch(): Unable to create file 308 because Permission denied in /srv/www/htdocs/setup.php on line 128
second try directly to setup all bots script on Aug 07,2008
the zero-byte tracking file 27e is created.
but no bot_blocker.*** created.
ls -al /home/botchecken/block/
drwxrwxrwx 2 root root 4096 14. Aug 00:54 .
drwxrwxrwx 4 root root 4096 14. Aug 00:50 ..
-rw-r--r-- 1 wwwrun www 0 14. Aug 00:54 27e.htaccess
<IfModule mod_php5.c>
php_value auto_prepend_file "/home/botchecken/bot_blocker.php"
</IfModule>apache error_log:
[error][client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45
[error] [client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45, referer: http:mywebsite.com/
test all bots script
<?php
// test the bot_blocker file
$bot_blocker='/home/botchecken/bot_blocker.php';
echo "Testing `$bot_blocker'...<br />\n";
if( file_exists( $bot_blocker )) {
if( file_exists( $ipFile )) {
echo "<b class='oops'>(sorry, `$ipFile' already exists & cannot test)</b><br />\n";
} else {
echo "Using one include() iteration...<br />\n";
include( $bot_blocker );
if( file_exists( $ipFile )) {
echo "<b class='ok'>Success!: tracking-file `$ipFile' created.</b><br /><br />\n";
} else {
echo "<b class='oops'>Fatal Error: tracking-file `$ipFile' not created.</b><br />\n";
}
}
} else {
echo "<b class='oops'>(sorry, prog-write failed & cannot test)</b><br />\n";
}
?>
remove code in .htaccess
<IfModule mod_php5.c>
php_value auto_prepend_file "/home/botchecken/bot_blocker.php"
</IfModule>
test again
Info:
Testing `/home/botchecken/bot_blocker.php'...
Using one include() iteration...
Success!: tracking-file `/home/botchecken/block/27e' created
[edited by: minibear at 1:03 am (utc) on Aug. 14, 2008]
Otherwise every server type is different, and each have multiple ways to achieve it. In Apache you can "deny" access in httpd.conf or from .htaccess or you can rewrite external (HTTP) URL requests to a path that does not exist, so generating a 404 error to the user.
As I thought, a permissions issue. Easily fixed.
It seems that you are creating
bot_checker.phpin one directory, then checking it in another.
`/home/botchecken' is
`0755 root.root'. Apache will have read-only permission to that directory. It also looks likely that Apache is not setup for that directory. You do not give the permission structure for
/srv/www/htdocs, so I cannot comment on that one.
The fix is exactly as in my earlier post. If web-accessible, do not forget to change the permissions back afterwards (not block/ & log/).
Probably the best thing is to create a single directory from root of the web-space, and place everything within that. There will be little problem with folks accessing
bot_blocker.phpdirectly, in any case. The only result that they can have is to block themselves! My main desire was to prevent direct access to the tracking files. I guess that folks should use the most easily-accessible mechanism of their web-server to do that, and the setup script should leave them to do that themselves.
apache error_log:
[error][client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45
[error] [client] PHP Notice: Undefined index: REMOTE_ADDR in /home/botchecken/bot_blocker.php on line 45, referer: http:mywebsite.com/
I have finally only this Problem :).
line 45
$ipRemote = ${$_SERVER_ARRAY}['REMOTE_ADDR'];
Version 0.11 released New! Improved!
Taking on the lessons of this thread, the setup & install script has been re-constituted. It now allows the root-directory for the bot_blocker script to be any (pre-existing) directory that the web-server has permission to write to. All script and sub-directory names can be customised.
This now means that the bot-blocker can be installed "above the root", even via http. It also guarantees that the bot-blocker script and all associated files will be within a single directory, which makes maintenance far simpler.
You can find the download file via the library link (first post on this thread, and first post on the library thread). The file to look for is "bot_blocker_v0.11.7z". You do not actually have to download the file; all text files can be viewed anonymously from the download link, and every file within the archive is a text file. Downloading the archive requires an FOC login.
As a final comment on the setup script, the actual block-script installed is identical to the routine issued across all these years (it works perfectly, so why change it?). It is simply customised to the server on which it is installed.
A final, final comment on 'installing "above the root"': many hosting companies provide web-space where the user has FTP access to the web-root directory & below, but no higher. You will do well to avoid such a brain-dead setup, and search out a host that also provides FTP (or other) access to a directory 'above the root'. That directory is the place to put all files that you do not want the public to have web-access to. It is the simplest means to provide an extra layer of security for your site.
echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under undue load</b></p>";
echo "<p>$visits visits from your IP-Address within the last ". (( int ) ( $duration / 3600 )) ." hours.</p><p>Please wait ". (( int ) ( $wait / 3600 )) ." hours before retrying.</p></body></html>";
echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under heavy load</b></p>";
echo "<p>You are scraping this site too quickly.</p><p>Please wait at least $wait secs before retrying.</p></body></html>";
echo "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"><html><head><title>Access Blocked</title></head><body><p><b>Server under heavy load</b></p>";
echo "<p>You are scraping this site too quickly.</p><p>Please wait at least $wait secs before retrying.</p></body></html>";
so that each one validated and used correct coding.
Thanks for your input, g1smd. Using the breakfast-rating-method (for commitment), that makes you 'pig' (as opposed to 'hen') where HTML coding validation is concerned, huh?
Bacon 'n' eggs is a good analogy for illustrating the different natures of commitment. Some people are 'Hens'; definitely committed, since they are donating their children. But they can always lay another egg. Other people are like 'Pigs'... fully committed to providing breakfast.