Welcome to WebmasterWorld Guest from

Forum Moderators: bakedjake

Message Too Old, No Replies

Blekko Does Not Honor NOARCHIVE?

11:40 am on Nov 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
votes: 8

Many sites go to great lengths to prevent scrapers from stealing their content.
These same sites also generally prevent the major search engines from cache-ing their pages, using the robots noarchive tag
<meta name="robots" content="noarchive">

notice how WebmasterWorld doesn't have a 'cached' link in the SERPS, this is an example of noarchive in use, all the major search engines support it.

the reason being, is that a search engine cache is a well known backdoor for scrapers, who can scrape your content through their cache instead of directly from your site.

however blekko, the new search engine, has decided that it will not respect the noarchive tag.

I approached blekko to ask them about this and Robert Saliba, of Blekko Inc said :
"we think that the meta noarchive tag is counter to providing our users with transparent information
regarding the ranking and display of search results."

luckily though, for web admins who do use the noarchive tag, he had a solution, as he also said this...
"We also want to respect the wishes of website administrators. Accordingly,
we are making changes so that In the future, we will treat the meta
noarchive tag as a meta noindex tag."
2:55 pm on Jan 5, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
votes: 8

>>Go check out WebmasterWorld's cache pages on blekko:

interestingly the top 2 results are for




despite the fact that webmasterworld.com is redirected to www.WebmasterWorld.com
6:59 pm on Jan 6, 2011 (gmt 0)

New User

10+ Year Member

joined:Apr 23, 2004
votes: 0

We haven't crawled webmasterworld.com and haven't crawled them through a 3rd party API either. Just like Google and Bing, blekko includes results for sites it cannot crawl based on inbound anchortext. The snippet looks like it's from dmoz to me.

Pretty standard stuff in SE's for the past 10 years.
10:40 pm on Jan 6, 2011 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
votes: 88

Forget the word "crawl", you provided results via a 3rd party API

right from your site...

additional web results
1 to 20 /yahoo results for site:webmasterworld.com

1. Skype Files For IPO Webmaster General forum at WebmasterWorld
like | seo links cache | spam

The number of shares to be offered and the price range for the offering have not ... A registration statement relating to the securities has been filed with the ...


2. Google Gift Has Arrived Google AdSense forum at WebmasterWorld
like | seo links cache | spam

Santa looked like a fedex driver google gift has arrived ... all rejected with a razor blade waiting to open a vein if that Google gift didn't show up the day before xmas. ...


3. Topix Eliminates Fee for Expedited Content Review Community ...
like | seo links cache | spam

Abusive posts will be dealt with more quickly topix eliminates fee for expedited content review


4. Experimenting With AdSense For Mobile Browsers
like | seo links cache | spam


So on and so forth.

Blocked by robots, yet results still displayed, hmmmm...

... and NOARCHIVE worked properly via the Yahoo result sets. :)
11:39 pm on Jan 6, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
votes: 0


Those of us who have chosen to EITHER ban your bot's IPs, or have disallowed your bots via robots.txt, or have done BOTH...

Will ALL content gathered from our site BEFORE the blocking methods were put in place be removed from your search engine?

This is the second time I have asked you.

Kindly confirm for all of us, just so there's absolutely no room for misunderstandings.
This 64 message thread spans 3 pages: 64