Very much so and having good press and buzz around a project might cause me to whitelist them if I'm assuming it will be beneficial in the long wrong. Then again there's a crawler that claims to be Silicon Valley VC backed that has been crawling for years, claims to honor robots.txt, but I finally blocked them as other than the constant crawling nothing on their site has ever changed in all this time.
Your log files, the good ones send you traffic, the rest waste your resources.
The bigger problem I see besides the scrapers is the new WEB 2.0 startups with things like Oodle, Kosmix, etc. crawling your site to analyze your content and aggregate it into their offerings assuming your content matches.
a) who are they to build yet another business to make money off my back
b) who are they to waste my bandwidth and CPU without my permission
There is a sense of entitlement going on that if you have content on the web then it's fair game to make money from it based on fair use laws. I'm sorry, I think it's time that PayPerCrawl might be a valid concept. Slip me some of your VC funds via my PayPal account and I'll let you crawl maybe 1,000 pages for $5 or some nonsense just to help offset my costs of doing business as I wouldn't need the more expensive a dual xeon server if it wasn't for all the bots.