Msg#: 4227707 posted 11:40 am on Nov 7, 2010 (gmt 0)
Many sites go to great lengths to prevent scrapers from stealing their content. These same sites also generally prevent the major search engines from cache-ing their pages, using the robots noarchive tag <meta name="robots" content="noarchive">
notice how WebmasterWorld doesn't have a 'cached' link in the SERPS, this is an example of noarchive in use, all the major search engines support it.
the reason being, is that a search engine cache is a well known backdoor for scrapers, who can scrape your content through their cache instead of directly from your site.
however blekko, the new search engine, has decided that it will not respect the noarchive tag.
I approached blekko to ask them about this and Robert Saliba, of Blekko Inc said : "we think that the meta noarchive tag is counter to providing our users with transparent information regarding the ranking and display of search results."
luckily though, for web admins who do use the noarchive tag, he had a solution, as he also said this... "We also want to respect the wishes of website administrators. Accordingly, we are making changes so that In the future, we will treat the meta noarchive tag as a meta noindex tag."
Msg#: 4227707 posted 6:59 pm on Jan 6, 2011 (gmt 0)
We haven't crawled webmasterworld.com and haven't crawled them through a 3rd party API either. Just like Google and Bing, blekko includes results for sites it cannot crawl based on inbound anchortext. The snippet looks like it's from dmoz to me.
Pretty standard stuff in SE's for the past 10 years.