Forum Moderators: phranque
The restrictions I have is that I cannot modify any of the pages on the server as it is not our code base. I do, however, have full access to the machines (which are running Linux.)
Also, we want to allow this spidering but limit it to a total of 5000 hits per day, or to a certain number of hits per second, by IP address.
I've looked around and I've found bandwidth limiting apache modules, but they don't seem to have any method to update the limits dynamically while the server is running. We're not hosting video or any large content, so bandwidth limiting is of little use anyway.
I've also found some scripts that will trap spiders and block them, but they require code modifications.
I could write a program that tails the apache log and does a manual count, but that seems inefficient and prone to problems if it crashes, etc.
Surely this isn't a new problem and someone has implemented something like I've described?
Any help appreciated!