homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

This 42 message thread spans 2 pages: < < 42 ( 1 [2]     
Blocking Badly Behaved Bots
An update/fix for a very useful routine
AlexK




msg:1299766
 3:07 am on Mar 10, 2005 (gmt 0)

I have been using (a slightly modified version of) the routine published on WebmasterWorld here [webmasterworld.com] for the last 18 months. It certainly stopped unruly bots and site-scrapers, but the logic was flawed somewhere.

Now, because my site is on a temporary server, and the main server is offline (but still connected to the web) whilst it gets an upgrade, I have had a chance to find out the flaw in the logic.

The "$newtime" was updated with the wrong value, causing perfectly-well-behaved bots to always get blocked if they crawled the site for long enough.

Here are some figures using the W3C link-validator [validator.w3.org], which crawls at 1 hit/sec:

iMaxVisit = 10 
iTime = 10 (seconds)
$newTime = $oldTime + $iTime;
oldT-test = ( $oldTime - $time - ( $iTime * $iMaxVisit ))

01:48:35 oldT=01:50:32 newT=01:50:42 oldT-test=17 Blocked
01:48:32 oldT=01:50:31 newT=01:50:41 oldT-test=19 Blocked
01:48:31 oldT=01:50:18 newT=01:50:28 oldT-test=7 Blocked
01:48:29 oldT=01:50:08 newT=01:50:18 oldT-test=-1
01:48:28 oldT=01:49:58 newT=01:50:08 oldT-test=-10
01:48:26 oldT=01:49:48 newT=01:49:58 oldT-test=-18
01:48:25 oldT=01:49:38 newT=01:49:48 oldT-test=-27
01:48:21 oldT=01:49:28 newT=01:49:38 oldT-test=-33
01:48:20 oldT=01:49:18 newT=01:49:28 oldT-test=-42
01:48:19 oldT=01:49:08 newT=01:49:18 oldT-test=-51
01:48:15 oldT=01:48:58 newT=01:49:08 oldT-test=-57
01:48:14 oldT=01:48:48 newT=01:48:58 oldT-test=-66
01:48:13 oldT=01:48:38 newT=01:48:48 oldT-test=-75
01:48:10 oldT=01:48:28 newT=01:48:38 oldT-test=-82
01:48:09 oldT=01:48:18 newT=01:48:28 oldT-test=-91
01:48:08 oldT=01:48:08 newT=01:48:18 oldT-test=-100

You can see that it gets blocked, even though there should be no problem. Changing the logic of the $newtime setting fixes it:

iMaxVisit = 10 
iTime = 10
$newTime = $oldTime + ( $iTime / $iMaxVisit );

02:06:49 oldT=02:06:49 newT=02:06:50 oldT-test=-100
02:06:48 oldT=02:06:48 newT=02:06:49 oldT-test=-100
02:06:44 oldT=02:06:44 newT=02:06:45 oldT-test=-100
02:06:43 oldT=02:06:43 newT=02:06:44 oldT-test=-100
02:06:42 oldT=02:06:42 newT=02:06:43 oldT-test=-100
02:06:40 oldT=02:06:40 newT=02:06:41 oldT-test=-100
02:06:39 oldT=02:06:39 newT=02:06:40 oldT-test=-100
02:06:37 oldT=02:06:37 newT=02:06:38 oldT-test=-100
02:06:35 oldT=02:06:35 newT=02:06:36 oldT-test=-100
02:06:34 oldT=02:06:34 newT=02:06:35 oldT-test=-100
02:06:33 oldT=02:06:33 newT=02:06:34 oldT-test=-100

iMaxVisit = 5
iTime = 10
$newTime = $oldTime + ( $iTime / $iMaxVisit );

02:15:26 oldT=02:15:33 newT=02:15:35 oldT-test=-43
02:15:25 oldT=02:15:31 newT=02:15:33 oldT-test=-44
02:15:24 oldT=02:15:29 newT=02:15:31 oldT-test=-45
02:15:21 oldT=02:15:27 newT=02:15:29 oldT-test=-44
02:15:20 oldT=02:15:25 newT=02:15:27 oldT-test=-45
02:15:19 oldT=02:15:23 newT=02:15:25 oldT-test=-46
02:15:17 oldT=02:15:21 newT=02:15:23 oldT-test=-46
02:15:16 oldT=02:15:19 newT=02:15:21 oldT-test=-47
02:15:15 oldT=02:15:17 newT=02:15:19 oldT-test=-48
02:15:13 oldT=02:15:15 newT=02:15:17 oldT-test=-48
02:15:12 oldT=02:15:13 newT=02:15:15 oldT-test=-49
02:15:11 oldT=02:15:11 newT=02:15:13 oldT-test=-50

And here is the modified routine:

// -------------- Start blocking badly-behaved bots -------
$oldSetting= ignore_user_abort( TRUE );
$remote = $_SERVER[ 'REMOTE_ADDR' ];
if(( substr( $remote, 0, 10 ) == '66.249.64.' ) or// Google has blocks 64.233.160.0 - 64.233.191.255
( substr( $remote, 0, 10 ) == '66.249.65.' ) or// Google has blocks 66.249.64.0 - 66.249.95.255
( substr( $remote, 0, 10 ) == '66.249.66.' ) or// Google has blocks 72.14.192.0 - 72.14.207.255
( substr( $remote, 0, 9 ) == '216.239.3' ) or// Google has blocks 216.239.32.0 - 216.239.63.255
( substr( $remote, 0, 9 ) == '216.239.4' ) or
( substr( $remote, 0, 9 ) == '216.239.5' ) or
( substr( $remote, 0, 10 ) == '65.54.188.' ) or// MS has blocks 65.52.0.0 - 65.55.255.255
( substr( $remote, 0, 10 ) == '207.46.98.' ) or// MS has blocks 207.46.0.0 - 207.46.255.255
( substr( $remote, 0, 13 ) == '66.194.55.242' )// Ocelli
) {
// let well-behaved bots through
} else {
$iTime= 5;// secs; check interval
$iMaxVisit= 20;// Maximum visits allowed within $iTime
$iPenalty= 60;// Seconds before visitor is allowed back
$ipLength= 3;// integer; 2 = 255 files, 3 = 4,096 files
$ipLogFile= _B_DIRECTORY . _B_LOGFILE;
$ipFile= _B_DIRECTORY . substr( md5( $remote ), -$ipLength );
$logLine= '';
$newTime= 0;
$time= time();
$oldTime= ( file_exists( $ipFile ))
? filemtime( $ipFile )
: 0;
if( $oldTime < $time ) { $oldTime = $time; }
$newTime= $oldTime + ( $iTime / $iMaxVisit );
if( $oldTime >= $time + ( $iTime * $iMaxVisit )) {
touch( $ipFile, $time + ( $iTime * ( $iMaxVisit - 1 )) + $iPenalty );
header( 'HTTP/1.0 503 Service Temporarily Unavailable' );
header( 'Connection: close' );
header( 'Content-Type: text/html' );
echo "<html><body><p><b>Server under heavy load</b><br />";
echo "More than $iMaxVisit visits from your IP-Address within the last $iTime secs. Please wait $iPenalty secs before retrying.</p></body></html>";
$useragent= ( isset( $_SERVER[ 'HTTP_USER_AGENT' ]))
? $_SERVER[ 'HTTP_USER_AGENT' ]
: '<unknown user agent>';
$logLine= "$remote ". date( 'd/m/Y H:i:s' ) ." $useragent\n";
$log= file( $ipLogFile );
if( $fp = fopen( $ipLogFile, 'a' )) {// a tiny danger of 2 threads interfering; live with it
if( count( $log ) >= _B_LOGMAXLINES ) {// otherwise grows like Topsy
fclose( $fp );// flock() is disabled in some linux kernels (eg 2.4)
array_shift( $log );// fopen, fclose put as close together as possible
array_push( $log, $logLine );
$logLine= implode( '', $log );
$fp= fopen( $ipLogFile, 'w' );
}
fputs( $fp, $logLine );
fclose( $fp );
}
exit();
}
touch( $ipFile, $newTime );
}
ignore_user_abort( $oldSetting );
// -------------- Stop blocking badly-behaved bots --------

Sorry for such a long post.

 

tito




msg:1299796
 11:34 pm on Apr 28, 2005 (gmt 0)

please, in which dir. the iplog.dat file has to be placed? do i have to chmod it?

AlexK




msg:1299797
 1:44 am on Apr 29, 2005 (gmt 0)

tito:
in which dir. the iplog.dat file has to be placed?

msg #25:
Both $ipLogFile and $ipFile are created on-the-fly if not already existing.

Be careful to also read msg #29.

tito




msg:1299798
 11:52 am on Apr 29, 2005 (gmt 0)

Thanks AlexK, now i see, this morning I've found a brand new iplog file on my folder with the logs.

great script, thaks so much,
tito

AlexK




msg:1299799
 7:42 am on May 10, 2005 (gmt 0)

The final item now (possibly) is to reverse the use of atime & mtime ... I'll run the routine for another week to give it a good test

Well, it was more than a week, but I wanted to wait until the modified routine caught some scraper-on-speed, plus check that it did not block well-behaved-bots. Both checks have panned out OK (once again, in what follows the `(number)' is the number of log-lines in that second of time:
Blocked IPs:
* 62.254.0.30 [ nott-cache-5.server.ntli.net ] 1000 line(s)
62.254.0.30 09/05/2005 17:13:45 (11)
62.254.0.30 09/05/2005 17:13:44 (11)
62.254.0.30 09/05/2005 17:13:43 (3)
62.254.0.30 09/05/2005 17:13:37 (9)
62.254.0.30 09/05/2005 17:13:36 (11)
62.254.0.30 09/05/2005 17:13:35 (16)
62.254.0.30 09/05/2005 17:13:34 (11)
62.254.0.30 09/05/2005 17:13:33 (7)
//-------------------------------------------------------------------------------------
What is interesting about this is that this character browses from my home-town (Nottingham, UK).

So, here is the re-written routine, incorporating all amendments:
The items prepended with an underscore are Constants that need to be define()'d somewhere in your script before this snippet of code gets used:
eg:
define( '_B_DIRECTORY', '/full/path/on/server/' );
define( '_B_LOGFILE', 'logfile.name' );
define( '_B_LOGMAXLINES', '1000' );
These Constants can be variables or even constant-values within the code - your choice.
Directory permissions: `_B_DIRECTORY` needs to be read-writeable by the apache-group.
Both $ipLogFile and $ipFile are created on-the-fly if not already existing.
//----------------Start-blocking-badly-behaved-bots---------------------------------------------
$oldSetting= ignore_user_abort( TRUE );
$remote = $_SERVER[ 'REMOTE_ADDR' ];
$bInterval= 10;// secs; check interval (best < 30 secs)
$bMaxVisit= 20;// Maximum visits allowed within $bInterval
$bPenalty= 60;// Seconds before visitor is allowed back
$bTotVisit= 500;// total visits allowed within a 24-hr period
$bTotBlock= 42200;// secs; period to block long-duration scrapers
$ipLength= 3;// integer; 2 = 255 files, 3 = 4,096 files
$ipLogFile= _B_DIRECTORY . _B_LOGFILE;
$ipFile= _B_DIRECTORY . substr( md5( $remote ), -$ipLength );
$logLine= '';
$time= time();
$fileATime= $time;// access time:-tracks visits
$fileMTime= $time;// modification time:-tracks duration
if( file_exists( $ipFile )) {
$fileATime= fileatime( $ipFile );
$fileMTime= filemtime( $ipFile );
// foll test keeps the tracking going to catch slow scrapers
if((( $time - $fileATime ) > $bInterval ) or (( $time - $fileMTime ) > 84400 )) {
$fileMTime = $fileATime = $time;
}
$fileATime++;
$visits= $fileATime - $fileMTime;
$duration= $time - $fileMTime;// secs
if( $duration < 1 ) $duration = 1;
// test for fast scrapers
if(( $visits >= $bMaxVisit ) and (( $visits / $duration ) > ( $bMaxVisit / $bInterval ))) {
$fileMTime= $time = $time - $bInterval;
$fileATime= $time + $bMaxVisit + (( $bMaxVisit * $bPenalty ) / $bInterval );
header( 'HTTP/1.0 503 Service Temporarily Unavailable' );
header( 'Connection: close' );
header( 'Content-Type: text/html' );
echo "<html><body><p><b>Server under heavy load</b><br />";
echo "$visits visits from your IP-Address within the last $duration secs. Please wait $bPenalty secs before retrying.</p></body></html>";
$logLine= "$remote ". date( 'd/m/Y H:i:s' ) ." $useragent\n";
} elseif( $visits >= $bTotVisit ) { // test for slow scrapers
$fileMTime= $time = $time - $bInterval;
$fileATime= $time + $bMaxVisit + (( $bMaxVisit * $bTotBlock ) / $bInterval );
header( 'HTTP/1.0 503 Service Temporarily Unavailable' );
header( 'Connection: close' );
header( 'Content-Type: text/html' );
echo "<html><body><p><b>Server under undue load</b><br />";
echo "$visits visits from your IP-Address within the last 24 hours. Please wait ". (( int ) $bTotBlock / 3600 ) ." hours before retrying.</p></body></html>";
$logLine= "$remote ". date( 'd/m/Y H:i:s' ) ." $useragent (slow scraper)\n";
}
// log badly-behaved bots, then nuke 'em
if( $logLine ) {
touch( $ipFile, $fileMTime, $fileATime );
$useragent= ( isset( $_SERVER[ 'HTTP_USER_AGENT' ]))
? $_SERVER[ 'HTTP_USER_AGENT' ]
: '<unknown user agent>';
$log= file( $ipLogFile );
if( $fp = fopen( $ipLogFile, 'a' )) {// tiny danger of 2 threads interfering; live with it
if( count( $log ) >= _B_LOGMAXLINES ) {// otherwise grows like Topsy
fclose( $fp );// flock() disabled in some kernels (eg 2.4)
array_shift( $log );// fopen,fclose put close together as possible
array_push( $log, $logLine );
$logLine= implode( '', $log );
$fp= fopen( $ipLogFile, 'w' );
}
fputs( $fp, $logLine );
fclose( $fp );
}
exit();
}
}
touch( $ipFile, $fileMTime, $fileATime );
ignore_user_abort( $oldSetting );
//----------------Stop-blocking-badly-behaved-bots----------------------------------------------

I should, now, be able to put this to bed. The one thought that I have had is that perhaps $bInterval should be longer to better catch the slow scrapers. I am just wary of catching the search-bots that I want to scrape the site.

WebCurious




msg:1299800
 5:18 am on May 13, 2005 (gmt 0)

In reading this thread, I had a few comments/observations. I'll state up front, though, that I've not run any large website -- with the most stuff being done for a rather largish home-page and ancillary pages.

1) Re: DDoS: Blocking single IP's won't be effective if the attacker has enough remote machines. “Researchers estimate that the number of zombie machines in botnets increases by 300,000 to 350,000 every month,” notes Gostev, and “the total number of zombies is estimated at several million.” (http://esj.com/enterprise/article.aspx?EditorialsID=1364)

2) Even if I block 10,000 bots, how fast can a webserver serve up the "Access Denied" message? In some cases, I wonder if one could programmatically tell one's firewall to send ICMP redirects (back to the originating machine) or ICMP "unreachable" ...not sure how that would play with routing "valid" traffic though.

3) If I am not worried about someone making a copy for their local browsing and they have a fast mirroring tool, I'd "try" to base denying service based on my webserver's load. It seems unnecessary to block fast requesters if one's webserver is lightly loaded.

4) Instead of serving up scripted pages with every page-load, one might use "squid" in its webserver-accelerator mode to serve up static content. With judicious use of the last modified date, squid could reload a dynamic page only when the underlying database of content has changed. I.e. - if a dynamic page is database driven, only have squid update its cached, static page when the database causes the page to change. If there is a need to serve up random/rotating "ads", maybe the squid cached page could use a dynamic element only for the "ad". This is especially useful if one is serving up ads from a 3rd party ad-placement service. Alternatively one could force squid to reload the page every "N" seconds to pick up a new "ad". "N" could be a low number if server load is "low", and higher if server load is high.

5) I might consider that IP's aren't always constant -- especially for home users. Even some DSL ISP's use DHCP and a given machine may have a new IP every time it reboots. IP reuse by different user should be considered when deciding on "penalties". I.e. - banned for minutes is probably harmless, but banning for days -- suppose one or more of my favorite site-users gets accidently put on a "banned list" -- not the end of the world, but an inconvenience, nevertheless.

6) If the webserver runs a script every time to decide access, isn't that a separate process create, file access, and a reply with content? If one's webserver supports access control's by directory - it might be more efficient to modify an access file like ".htaccess" on the fly. Apache "should"** be more efficient at blocking access than dynamically serving up one's own "access denied" page. It seems like a return code of "503" - Service Temporarily Unavailable" might be a good choice for temporary blocking.

Hope I haven't written too much -- was just some random thoughts I had when reading this thread... :-)

Linda

AlexK




msg:1299801
 8:52 am on May 13, 2005 (gmt 0)

Welcome to WebMistressWorld, WebCurious! A nicely thought-out message with some excellent points.

First issue: do you have bandwidth costs for your site? If `yes' (and you run PHP) then the script is useful. If `no' (or you do not have access to PHP) then it is academic, for at least *one* of it's principal values.

There are 2 main uses for the script:

    1 To stop home-users trying to download the entire site.
    2 To thwart competitors doing the same as item#1, but slowly across an entire day(s).

The snippet of script given above is proven in #1, but has yet to be proven (for me) with #2. There are 2 main values for a webmaster from the script:
    3 reduces bandwidth costs
    4 reduces server load

Typical of #1 are download scripts/programs such as Website eXtractor or WebZIP (from examples blocked in the past). To example #4, the character mentioned in msg #:34 was trying to download my site at 19 pages a second at the peak. My stats also indicate that across a ~3 month period the server has dished up 1,989 status-503 (blocked) pages; how many more pages would have been sucked from the server if not stopped? This is a big issue for webmasters.

1) Re: DDoS: Blocking single IP's won't be effective if the attacker has enough remote machines.
Absolutely true - the script as it stands is useless against this. The place to stop such things (if at all possible) is at the Firewall.

3) ... I'd "try" to base denying service based on my webserver's load.
Here is a snippet of script to do that:

/*
* _freebsd_loadavg() - Gets the max() system load average from uname(1)
*
* The max() Load Average will be returned
*/
function _freebsd_loadavg() {
$buffer = `uptime`;
ereg( "averag(es¦e): ([0-9][.][0-9][0-9]),([0-9][.][0-9][0-9]),([0-9][.][0-9][0-9]*)", $buffer, $load );
return max(( float ) $load[ 2 ], ( float ) $load[ 3 ], ( float ) $load[ 4 ]);
}// _freebsd_loadavg()
/*
* _linux_loadavg() - Gets the max() system load average from /proc/loadavg
*
* The max() Load Average will be returned
*/
function _linux_loadavg() {
$buffer= '0 0 0';
$f= fopen( '/proc/loadavg', 'r' );
if(!feof( $f )) $buffer = fgets( $f, 1024 );
fclose( $f );
$load= explode( ' ', $buffer );
return max(( float ) $load[ 0 ], ( float ) $load[ 1 ], ( float ) $load[ 2 ]);
}// _linux__loadavg()

6) ... it might be more efficient to modify an access file like ".htaccess" on the fly.
Modifying it is easy... de-modifying it is another matter.

The point is that this script works and is easy. Other solutions are certainly possible. Why do you not explore them, and present your (improved) solution? That way, we all improve. I, for one, will certainly look forward to it.

ann




msg:1299802
 1:16 pm on Jun 6, 2005 (gmt 0)

Quick questions:

#1 where do you put the script, in the root?
#2 How is it called? is it linked anywhere?

As you can tell I amm clueless but would really like to have it running.

I have no access to the server configs and cannot use htaccess

Thanks,

Ann

AlexK




msg:1299803
 4:16 pm on Jun 6, 2005 (gmt 0)

ann:
#1 where do you put the script, in the root?

The simple answer is: "anywhere" - but that doesn't really help, so here is how I do it:

I've got a file of pre-written sub-routines (function()s) which is included on each web-script at the top:

require_once( '/server/path/to/include.file' );

The routine in msg#36 is at the top of this include file.

#2 How is it called? is it linked anywhere?
This question is answered by the above; as soon as a web-script is called by a browser, the include.file is run, and any php-script on that page is run. Put it all as close to the top of any scripts as possible and any blocking will be done immediately.

ann




msg:1299804
 12:06 am on Jun 7, 2005 (gmt 0)

Thanks,

Will give it a try! :)

Ann

AlexK




msg:1299805
 5:46 am on Jun 21, 2005 (gmt 0)

I am just wary of catching the search-bots that I want to scrape the site.

Finally, after all these months, the routine caught a slow-scraper. Unfortunately, it was a GBot!

Blocked IPs:
* 66.249.66.172 [ crawl-66-249-66-172.googlebot.com ] 128 line(s)
128 total lines in log-file.
Log Line Lines
66.249.66.172 21/06/2005 01:48:41 1
66.249.66.172 21/06/2005 01:48:31 1
66.249.66.172 21/06/2005 01:48:21 1
66.249.66.172 21/06/2005 01:48:11 1
66.249.66.172 21/06/2005 01:47:57 1
66.249.66.172 21/06/2005 01:47:47 1
66.249.66.172 21/06/2005 01:47:37 1
66.249.66.172 21/06/2005 01:47:26 1
66.249.66.172 21/06/2005 01:47:16 1
66.249.66.172 21/06/2005 01:47:03 1
66.249.66.172 21/06/2005 01:46:52 1
66.249.66.172 21/06/2005 01:46:39 1
66.249.66.172 21/06/2005 01:46:25 1
...
66.249.66.172 21/06/2005 01:24:17 1
66.249.66.172 21/06/2005 01:24:06 1
66.249.66.172 21/06/2005 01:23:53 1
66.249.66.172 21/06/2005 01:23:43 1
66.249.66.172 21/06/2005 01:23:32 1
66.249.66.172 21/06/2005 01:23:21 (slow scraper) 1

I guess that $bTotVisit= 500; (total visits allowed within a 24-hr period) is too low.

Still, all functions are now proven. They all work fine.

thetrasher




msg:1299806
 9:22 pm on Jun 25, 2005 (gmt 0)

maybe you shouldn't block well-behaved bots. see code in msg #1

AlexK




msg:1299807
 9:00 am on Jun 26, 2005 (gmt 0)

If it walks like a duck, waddles like a duck and quacks like a duck, it's most likely a duck. Same with scrapers...

There's more than one kind of GBot, identified both by the referer (sic) string and by their behaviour:


1 HTTP/1.0 Googlebot/2.1 (+http://www.google.com/bot.html)
2 HTTP/1.1 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

The first one crawls at a reasonable 1 hit/10-secs, the second at upto 3 hits/sec. The first adds the content it crawls to the G index, the second does not. So far this month, the first has crawled 1,390 pages on my site; the second has crawled 21,571 pages - and remember, not one of those gets added to the index.

Here is the latest sequence on IP:66.249.65.232 that got this [#2] b*stard blocked:


$bInterval = 7; // secs; check interval (best < 30 secs)
$bMaxVisit = 14; // Maximum visits allowed within $bInterval
[26/Jun/2005:06:15:38 +0100] "GET /search.php?start=5454&next=1 HTTP/1.1" 200 7348
[26/Jun/2005:06:15:39 +0100] "GET /search.php?start=4418&next=1 HTTP/1.1" 200 7361
[26/Jun/2005:06:15:39 +0100] "GET /search.php?start=3497&next=1 HTTP/1.1" 200 7260
[26/Jun/2005:06:15:40 +0100] "GET /search.php?start=5354&prev=1 HTTP/1.1" 200 7451
[26/Jun/2005:06:15:40 +0100] "GET /search.php?eeprom=245625-01&macro=9&with=1 HTTP/1.1" 200 6452
[26/Jun/2005:06:15:41 +0100] "GET /mfcs.php?mid=118&nid=13945 HTTP/1.1" 200 6366
[26/Jun/2005:06:15:41 +0100] "GET /search.php?start=5454&prev=1 HTTP/1.1" 200 7344
[26/Jun/2005:06:15:41 +0100] "GET /search.php?start=3267&next=1 HTTP/1.1" 200 7180
[26/Jun/2005:06:15:42 +0100] "GET /search.php?eeprom=PCMCIA%5C1456VQC_DATA%20FAX_PCMCIA_MODEM-C17A&with=1 HTTP/1.1" 200 6402
[26/Jun/2005:06:15:42 +0100] "GET /search.php?start=4688&next=1 HTTP/1.1" 200 7412
[26/Jun/2005:06:15:43 +0100] "GET /search.php?start=9372&prev=1 HTTP/1.1" 200 7243
[26/Jun/2005:06:15:43 +0100] "GET /search.php?start=2517&prev=1 HTTP/1.1" 200 7620
[26/Jun/2005:06:15:44 +0100] "GET /search.php?start=5505&next=1 HTTP/1.1" 200 7399
[26/Jun/2005:06:15:44 +0100] "GET /search.php?start=4354&prev=1 HTTP/1.1" 503 146

The routine is doing exactly what I want it to do - blocking unruly bots. Whether from Google or anybody else.

---------------------
next part
Blocking Badly Behaved Bots #3 [webmasterworld.com]

[edited by: jatar_k at 3:37 pm (utc) on Oct. 11, 2005]

This 42 message thread spans 2 pages: < < 42 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved