|Blekko does not appear to honor ROBOTS.TXT|
| 3:09 am on Jan 5, 2011 (gmt 0)|
Spun off from this thread about blekko and NOARCHIVE:
I discovered WebmasterWorld was on blekko, and appeared to honor NOARCHIVE for the pages it showed of WebmasterWorld.
|Go check out WebmasterWorld's cache pages on blekko: |
I see snippets and when I click cache "Error: No content" so for WebmasterWorld it appears to be implemented to support NOARCHIVE while maintaining the snippets in the index.
Found out WebmasterWorld blocks blekko with robots.txt, blekko is banned from crawling WebmasterWorld, and the listings are being pulled from some 3d party SE API
How do you opt-out of blekko?
Apparently you can't.
| 10:17 am on Jan 5, 2011 (gmt 0)|
Didn't have time to start a new topic yesterday, but after reading the blekko thread also discovered that at least one of my sites (not just home page which could be done without crawling) appears to be in blekko despite the fact they've never been whitelisted (i.e. blekko would be receiving the bog standard disallow all robots.txt).
| 6:32 pm on Jan 6, 2011 (gmt 0)|
blekko strictly honors robots.txt. We do not crawl webmasterworld.com, as evidenced by the lack of a cached page in our serps.
Like Google and Bing, however, we still include serps for sites which we can't crawl if we have enough inbound anchortext material. Snippets can be obtained from anchortext, dmoz or other metadata without crawling the site.
| 12:39 am on Jan 8, 2011 (gmt 0)|
Didn't say blekko didn't honor robots.txt, I said it APPEARS to not honor robots.txt
I can block you in robots.txt yet my site and all it's SEO data still appears.
Plus you display SERPs pulled from 3rd party Yahoo API which at first glance, if you miss the disclaimer it's Yahoo results, would appear you crawled.
Not only that, in the SEO section for WebmasterWorld it claims you crawled, HUH?
|Crawled: 11h ago |
Pages Seen: 7,743
Pages Crawled: 811
Avg Page Length: 0
Avg Page Latency:
Robots: [webmasterworld.com...] (last fetched: 26d ago)
Home Page: [webmasterworld.com...]
So the question is, if robots.txt doesn't really opt-out fully, how the heck do you completely opt-out of Blekko?
No Yahoo API results, no SEO, nada.