homepage Welcome to WebmasterWorld Guest from 50.17.162.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Alternative Search Engines
Forum Library, Charter, Moderators: bakedjake

Alternative Search Engines Forum

    
Blekko does not appear to honor ROBOTS.TXT
incrediBILL




msg:4249216
 3:09 am on Jan 5, 2011 (gmt 0)

Spun off from this thread about blekko and NOARCHIVE:
[webmasterworld.com...]

I discovered WebmasterWorld was on blekko, and appeared to honor NOARCHIVE for the pages it showed of WebmasterWorld.

Go check out WebmasterWorld's cache pages on blekko:
[blekko.com...]

I see snippets and when I click cache "Error: No content" so for WebmasterWorld it appears to be implemented to support NOARCHIVE while maintaining the snippets in the index.


Found out WebmasterWorld blocks blekko with robots.txt, blekko is banned from crawling WebmasterWorld, and the listings are being pulled from some 3d party SE API

How do you opt-out of blekko?

Apparently you can't.

 

Status_203




msg:4249301
 10:17 am on Jan 5, 2011 (gmt 0)

Didn't have time to start a new topic yesterday, but after reading the blekko thread also discovered that at least one of my sites (not just home page which could be done without crawling) appears to be in blekko despite the fact they've never been whitelisted (i.e. blekko would be receiving the bog standard disallow all robots.txt).

skrenta




msg:4249948
 6:32 pm on Jan 6, 2011 (gmt 0)

blekko strictly honors robots.txt. We do not crawl webmasterworld.com, as evidenced by the lack of a cached page in our serps.

Like Google and Bing, however, we still include serps for sites which we can't crawl if we have enough inbound anchortext material. Snippets can be obtained from anchortext, dmoz or other metadata without crawling the site.

incrediBILL




msg:4250535
 12:39 am on Jan 8, 2011 (gmt 0)

Didn't say blekko didn't honor robots.txt, I said it APPEARS to not honor robots.txt

I can block you in robots.txt yet my site and all it's SEO data still appears.

Plus you display SERPs pulled from 3rd party Yahoo API which at first glance, if you miss the disclaimer it's Yahoo results, would appear you crawled.

Not only that, in the SEO section for WebmasterWorld it claims you crawled, HUH?
Crawled: 11h ago
Pages Seen: 7,743
Pages Crawled: 811
Hostrank: 670.6
Avg Page Length: 0
Avg Page Latency:
Robots: [webmasterworld.com...] (last fetched: 26d ago)
Home Page: [webmasterworld.com...]


So the question is, if robots.txt doesn't really opt-out fully, how the heck do you completely opt-out of Blekko?

No Yahoo API results, no SEO, nada.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Alternative Search Engines
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved