v3Exceed - 11:42 pm on Aug 5, 2012 (gmt 0)
From a clients perspective, there is no real difference between crawling and indexing and for the purpose of this thread its not really relevant either. The crawling of the external sites and the inclusion within Bing's index is fundamentally entwined within the scope of this thread.
The major concern for us is that Bing triggers the bot trap and then gets blocked from any of the information on the site. Although some developers will argue that these scraper bots pose no risk, we and others who employ bot traps and honeypots do so because we recognize a real threat from these scraper bots.
Our clients are still listed on Bing, but from links on other sites and not directly crawled. As I understand it, Yahoo uses some Bing, some Yandex and some other crawlers in their search results so not being directly listed in Bing hasn't really made much of an impact.
The first file the search bots are supposed to grab if present is the robots.txt. Then based on this information they crawl the site to include the site folders and information within their index. When the search bots ignore the robots.txt there really isn't any way for developers to direct the bot to the right information except by internal links or site map.
Regardless, if Google and others can follow the simple directives of a robots.txt there is no excuse for Bing not to follow suit. The suggestion that links can always be removed from Bing may work for a person with a few sites, but in volume this just doesn't work.