dstiles - 7:51 pm on Apr 2, 2013 (gmt 0)
I agree with keyplr: block ALL hosting/server farms EXCEPT those IPs you KNOW carry legitimate AND USEFUL bots. If you keep tabs on things you will not be surprised by a new "important search engine" - which in any case will take years to become prominent and useful. There is almost no "real" traffic from server farms: what would a server do with one of your web pages other than scrape or otherwise abuse its contents?
I probably go further than most here by blocking MS, G and several other IP ranges EXCEPT for their bots and even then, only "real" bots: for example, I reject image bots.
I have a relatively small web server - couple of dozen small-scale web sites. Already this month, not yet two days old, I have about 5500 unwanted hits from pretend-SE bots, hackers, scrapers, virus-implanters... Killing server farms is a good way to reduce the damage: none of those got more than a 403.
By the way: it isn't only servers that send out fake bing/google bots. Several of those I'm currently seeing (and rejecting) are from compromised broadband IPs. There are millions of those - far more than compromised servers. And hits from servers are very often deliberate attacks or scrapes anyway.