Forum Moderators: open
Facebook has updated its robot.txt file so that the site can only be crawled by a short list of search engines, including Google, Microsoft's Bing, China's Baidu, Russia's Yandex, and a few others.
Previously, Facebook's robot.txt allowed anyone to crawl the site, although the company had threatened to sue at least one developer for crawling, before adding new terms of service that barred scraping without the company's written permission. Some — including programmer and blogger Pete Warden, the man who Facebook threatened to sue — had complained that the social networking site was breaking the rules of the interwebs. The site was allowing unfettered crawling, but the company's legal team was not.
Not really, even though they are not yet on the list their visit will leave a trace in the logs and all they have to do is identify themselves in their UA and have a clear description of their goals on their site, then we can check them out and decide whether or not we will white list them as well.