incrediBILL - 9:48 am on Oct 22, 2011 (gmt 0)
Something to think about when anyone talks about "whitelisting" robots. <fe>All the important search engines are already established, so there's no point in wasting your bandwidth on the upstarts, outsiders and wannabes.</fe>
Whitelisting doesn't mean ignoring upstarts whatsoever. My recommendation is to wait until they are a viable player, at least in Beta, before letting them put a drain on your resources. For example, the hundreds of toy nutch crawlers (wannabes) people have running amok, not worth the bandwidth unless one of them actually goes public as a SE. Likewise, the flood of toy crawwler crap from AWS (Amazon Web Services) isn't worth allowing access at this time. However, new promising spiders like Yandex and even Blekko have been allowed on my whitelist.
Sites that whitelist and ignore things that come knocking do so at their own peril, just like sites that blacklist, both methods require monitoring to be effective and keep a site healthy.
Besides, just because a site isn't in the initial index of a new SE doesn't mean it won't get indexed quickly once you whitelist them when they're ready to be a primetime player.
This is roaming too far off topic, but the OP has pointed out the perils of relying on a single SE, you have to expand your traffic sources to keep a site viable, but that doesn't mean let any old crawler access your site nor does it invalidate the process of whitelisting crawlers.
It's all in the execution whether it's a success or not, more simply put, the devil's in the details. :)