jmccormac - 5:30 am on Aug 30, 2013 (gmt 0)
The iptables approach is useful because it can be used to block ports 80/443 but it also reduces unwanted traffic on the site and stops it getting to Apache. The main worry is the upper limit of IP ranges for iptables before it starts impacting the server performance. I've blocked China and a few other problem countries on a relatively low powered server with no major impact.
Using Apache with deny statements would generally result in a 403 page unless it is a customised minimal result page. Many scrapers tend to be quite braindead and ignore 403s and keep right on hammering sites. The indexed DBM files is one that I hadn't thought of though.
The IP range list is essentially a nuclear option as it blocks most data centres. It is a range list rather than individual IPs so while there may be some collateral damage of people using data centre IPs for web proxies, it should kill about 98% of scrapers. The main disadvantage with an iptables approach is a multi-site webserver where each site might need a separate set of iptables rules. That's where iptables, for me at least, begins to change from a simple solution to a complex one.