AlexK - 6:38 am on Apr 21, 2005 (gmt 0) As written, the routine pre-processes a page. What you want would require that it is split to both pre- and post-process a page: [added]: The routine tracks the number of visits in
I want to stop competitors from crawling my pages ... until they have my entire site.
I'm looking into a modification that will track that behavior and just shut them down after a few hundred pages
Hmmm. The routine as written creates an empty file ($ipFile) and uses the mod/access time of this file to activate a block. The same file could also contain content...
1 Use ob_start() at the beginning of the page
2 Use ob_get_contents() and strlen() at the end.
3 Increment the integer obtained in $ipFile
4 Adjust mod/access time
5 Check against your limits during pre-processing
You owe me a beer; a pint of ice-cold Guinness, please.
Y'know, after preening myself on my clever, clever reply, it has occurred to me that there is an even simpler solution...
As written, the routine pre-processes a page. What you want would require that it is split to both pre- and post-process a page:
The routine tracks the number of visits in$visits. It would be very easy to add a check for the gross number of visits. $visits is reset to zero when the mod-time falls behind the current time (a hit-rate of less than 1 a sec, which admittedly is 86k hits in 24 hours). My experience across the last 18 months, however, is that *no-one* scraping a site has the patience to hit your site so slowly (look at msg#9). Hence my original comment.