I've been think about what to serve up to those I don't want to serve!
I hope that makes sense ... i'm talking about what to serve to stealth bots ... eg. bots that do not identify themselves as what they are and that i don't want.
(obviously those that identify themselves i can allow or block as i see fit - although blocked bots may revisit in another disguise, i know that)
there's a bunch of possible options:
200 ... but an empty or minimal file
204 ... obviously empty
401 ... unauthorized
402 ... just for my own amusement
403 ... forbidden
404 ... not found
418 ... haha
500 ... server error
503 ... unavailable
quick proviso - i don't claim to be catching all bots, i'm sure some/many get through
likewise there might be the odd real user trapped who shouldn't be - that is collateral damage that i'm willing to accept.
i guess i've gone through phases, especially 500+ stage and 401 or 404
however i'm currently inclined towards 200 with a very minimal file.
the reasoning is that:
the majority of bot runners are dumb and just feed their list to their bot and keep doing so without modification, bandwidth is cheap and they don't care, they just scrape for their own reasons.
however, of course, some are smart, some doubtless way smarter than me, i'm of the view that they have their lists of uri's, which they believe to be 'valid', therefore any response other than a 200 OK, is likely to mean they are going to keep trying and if they still fail to try again in a different disguise.
... ultimately i'm talking about the 99% ... the 1% are going to get in anyway.
any thoughts?