v3Exceed - 9:13 pm on Aug 5, 2012 (gmt 0)
We manage hundreds of smaller business sites. We have found that Bing does indeed ignore the robots.txt directives for whatever reason, and placing a removal request is NOT a reasonable expectation for a real business.
For a while, we had been watching scraper bots from east European countries scrape copies of our clients websites for a variety of malicious reasons. We had even seen copies of a clients site hosting advertising on a foreign owned network.
In order to combat this, we integrated a bot trap. The bot trap requires that the robots.txt be ignored and information we have intentionally not allowed be accessed in order to trigger the trap.
Well guess who we consistently catch... Bing. In all of it's iterations and from all of it's ip's, Bing goes after the information we have asked not to be indexed. Even adjusting the robots.txt to specifically address the syntax that Microsoft says Bing will honor, does not work.
Our final solution is to completely ignore Bing. Since yandex, google, yahoo and the rest all index correctly without fail. We don't have the time or energy to constantly ask Bing to remove entries that they shouldn't have indexed in the first place.
The idea of using the noindex tag is moot, because our websites are entirely generated on the fly.
There is no sub directory or index to tag separately.
Bing should really get its act together before it is further relegated to the abyss that the Zune currently resides.
It is in Bing's interest to confirm to the robots.txt standards as each website that ALL developers produce provide the value to the search engine. If they expect to compete with Google, or even Yahoo in the future.