lucy24 - 11:37 pm on Feb 12, 2012 (gmt 0)
Googlebot, Yandex, Slurp & Bingbot
I was afraid you'd say that. As noted in the OP, I've ended up locking out both YahooCacheSystem and Yahoo! Slurp because I don't care for their behavior and they don't do me any good that I can see.
What happens if there is a new bot and you don't know about it? You may be losing customers
Whole nother thread there ;) If the big sites routinely lock out all but the major, established search engines, then there's a fabulous market in "The best of the rest" searches just waiting to be tapped.
And the day someone scrapes my content is the day I fall dead of shock, so let's call that a non-issue :) That's deliberately excluding the ebooks, which are public domain (out of copyright in the US) and widely available from other sources.
Still looking for someone who can explain to me in words of two syllables how you distinguish a robot from a human in the first place.
you give several IP ranges that I would never have permitted or, once blocked, released.
If you read carefully you'll note that I deliberately didn't distinguish between blocked and un-blocked. I looked at everyone, including the ones who never got anything but a fistful of 403s. I didn't keep before-and-after copies of the htaccess so I don't know how many blocks were restored promptly at month's end. I know I've still got an awful lot commented-out. Most of those are the visitors who once annoyed me terribly but haven't shown their faces since. On the other side is the "Well, OK, let's not overdo it"-- like the Ukrainians, who simply have to be locked out even though they really don't do anything.