Forum Moderators: open
Or is this too personal a question?
I run dynamic sites and I've had hundreds of page requests in a single minute and by then my server is bogged down and unresponsive until the queue clears out. Therefore, being PROACTIVE and checking per page, I've shut their speedy butts down in just a few pages and my server is still serving up pages vs. if I only checked once a minute being REACTIVE they would have my site at a stand still for quite a few minutes.
Besides, there are certain things you can only do in real-time, which you cannot detect in a log file, at least not the standard log files anyway.
[edited by: incrediBILL at 8:55 pm (utc) on Dec. 13, 2006]
Most log files don't track if the site came from a proxy server which allows me to track individuals accessing things like Google's translator, which is a sometimes source of scraping, opposed to just blocking it entirely because of too much access.
There are also some subtle things in the request header information that help me determine if it's a real mobile device and not someone spoofing a cell phone, a Treo, or some such device, or if it's a CGI/PHP proxy server, lots of little clues lost in the log file.
Lastly, you can't easily challenge a visitor with a CAPTCHA after they've already hit and run with a post-mortem logfile review. I can see if the visitor just kept asking for additional pages (bot) when CAPTCHA's are displayed, OR... answered the CAPTCHA in a couple of tries.
This makes it easy to link the scraper to the online scrapings and it's trivial to find them since the search engines point them out to me.
Can't do that reviewing a log file ;)