dstiles - 8:09 pm on Jun 19, 2013 (gmt 0)
Any SE that tries to use distributed crawling (eg majestic) is a non-starter in my book. There is no real way to control scraping activities of fake-UA bots (yes, I know majestic has a solution but it's the wrong approach!).
If potential SE operators would like to post tech details here (or better still in the search enging arm of this site) I for one would look favourably on letting them onto my server, assuming they are not parasites.
Part of the data-gathering aspect is a lack of webmaster-targetted information on a) SE names/URLs; b) crawling bot UAs; c) IP ranges (if possible). Some of the SEs I have looked at (including DDG) seem to have very sparse information.