homepage Welcome to WebmasterWorld Guest from 23.20.220.79
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Alternative Search Engines
Forum Library, Charter, Moderators: bakedjake

Alternative Search Engines Forum

This 64 message thread spans 3 pages: < < 64 ( 1 2 [3]     
Blekko Does Not Honor NOARCHIVE?
topr8




msg:4227709
 11:40 am on Nov 7, 2010 (gmt 0)

Many sites go to great lengths to prevent scrapers from stealing their content.
These same sites also generally prevent the major search engines from cache-ing their pages, using the robots noarchive tag
<meta name="robots" content="noarchive">

notice how WebmasterWorld doesn't have a 'cached' link in the SERPS, this is an example of noarchive in use, all the major search engines support it.

the reason being, is that a search engine cache is a well known backdoor for scrapers, who can scrape your content through their cache instead of directly from your site.

however blekko, the new search engine, has decided that it will not respect the noarchive tag.

I approached blekko to ask them about this and Robert Saliba, of Blekko Inc said :
"we think that the meta noarchive tag is counter to providing our users with transparent information
regarding the ranking and display of search results."

luckily though, for web admins who do use the noarchive tag, he had a solution, as he also said this...
"We also want to respect the wishes of website administrators. Accordingly,
we are making changes so that In the future, we will treat the meta
noarchive tag as a meta noindex tag."

 

topr8




msg:4249376
 2:55 pm on Jan 5, 2011 (gmt 0)

>>Go check out WebmasterWorld's cache pages on blekko:

interestingly the top 2 results are for

WebmasterWorld.com

and

www.WebmasterWorld.com

despite the fact that webmasterworld.com is redirected to www.WebmasterWorld.com

skrenta




msg:4249954
 6:59 pm on Jan 6, 2011 (gmt 0)

We haven't crawled webmasterworld.com and haven't crawled them through a 3rd party API either. Just like Google and Bing, blekko includes results for sites it cannot crawl based on inbound anchortext. The snippet looks like it's from dmoz to me.

Pretty standard stuff in SE's for the past 10 years.

incrediBILL




msg:4250063
 10:40 pm on Jan 6, 2011 (gmt 0)

Forget the word "crawl", you provided results via a 3rd party API

right from your site...

additional web results
1 to 20 /yahoo results for site:webmasterworld.com

1. Skype Files For IPO Webmaster General forum at WebmasterWorld
like | seo links cache | spam

The number of shares to be offered and the price range for the offering have not ... A registration statement relating to the securities has been filed with the ...

webmasterworld.com/webmaster/4184250.htm

2. Google Gift Has Arrived Google AdSense forum at WebmasterWorld
like | seo links cache | spam

Santa looked like a fedex driver google gift has arrived ... all rejected with a razor blade waiting to open a vein if that Google gift didn't show up the day before xmas. ...

webmasterworld.com/google_adsense/3165757.htm

3. Topix Eliminates Fee for Expedited Content Review Community ...
like | seo links cache | spam

Abusive posts will be dealt with more quickly topix eliminates fee for expedited content review

webmasterworld.com/community_building/4185328.htm

4. Experimenting With AdSense For Mobile Browsers
like | seo links cache | spam

webmasterworld.com/google_adsense/4153626.htm



So on and so forth.

Blocked by robots, yet results still displayed, hmmmm...

... and NOARCHIVE worked properly via the Yahoo result sets. :)

Angonasec




msg:4250091
 11:39 pm on Jan 6, 2011 (gmt 0)

@skrenta:

Those of us who have chosen to EITHER ban your bot's IPs, or have disallowed your bots via robots.txt, or have done BOTH...

Will ALL content gathered from our site BEFORE the blocking methods were put in place be removed from your search engine?

This is the second time I have asked you.

Kindly confirm for all of us, just so there's absolutely no room for misunderstandings.

This 64 message thread spans 3 pages: < < 64 ( 1 2 [3]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Alternative Search Engines
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved