We can spin wheels forever with all the uglies out there, or we can list (identify) the bots to allow.
My list starts with:
bing.com (and associated msnbot, and includes yahoo)
... and what would you add to your white list robots.txt while disallowing all others?
This is not that tongue in cheek... serious query as to what is VALID these days. Seems like we are working too hard instead of smart...