blend27 - 4:20 pm on Feb 25, 2011 (gmt 0)
This is the way I program it on the sites I work on these days:
1. Basic .htaccess takes care of the Mailformed/KNown Scraper UAs, blacklisted and Abusive Bot UAs.(access data gets recorded, mostly IPs that are candidates for the Blacklisted IP ranges to look at later)
I am on IIS Servers so canít use the goodies that come with Apache.
2. There is a query in memory that holds White Listed Bots IP Ranges(a few only), so I compare the IP if pass, record it(all the good stuff)
If not in those ranges:
3. IP Gets checked against known COLO/HOSTING/and some manually Blacklisted ranges. If in those ranges, fist page served is a small human check with Captcha Style question which is generated on the fly, the contents and style of the page is Random.
a. UA Check Again
b. IP Checked against DB index from Banned Table
c. IP Range Check against White Listed Ranges(yes I only check for white listed at this point), access form OTHER Ranges gets recorded.
step 4 is wrapped in SPEED trap that controls in/out content, if triggered, IP Gets banned for a specific time.
If passed, content is served(including bot trap links), but wait, there is more.....
5. I design my sites and use lots of CSS and JS, I use background images that are generated/served via a server side script, then those images and files are mentioned in HTML, External CSS and External JS, so IF those files are not requested the TRAP Page(human check) get served after 2 pages of HTML is served and until proven Innocent, the visitor is Guilty.
I block first and ask question later and not afraid to loose a visitor if something quacks like a duck.