Page is a not externally linkable
incrediBILL - 1:00 am on Jan 9, 2013 (gmt 0)
he got me thinking
Sorry about that.
Won't let it happen again! :)
Those headers can make a huge difference because all browsers pretty much send the same headers all the time but bots do not and it's much easier to trap tons of garbage via header analysis (for now) than user agents. Made a huge difference in how much stuff I was able to block than I did before once I started looking at them in more detail.
Just for fun I ran a test and disabled the data center blocking list just to see how much the header tests would stop all by themselves and it was amazing that a simple bit of code could really block most of the current crawler crud without user agent parsing, data center block lists, blacklists or any of the other time consuming crap.
Sadly, some have done a better job at faking headers and you need all the other stuff to still block them but at least they're currently the minority.