I posted here a while back when someone was asking about the YioopBot user-agent so I figured I'd do a follow-up. I am excited because my crawler has now just completed its first billion page crawl. Here is my blog post about the crawl:
[
yioop.com...]
The index can be found at
[
yioop.com...]
The software used to do the crawling, indexing, and web app were written in PHP and does not rely on any other crawling or indexing project. It is GPLv3 and can be downloaded from
[
seekquarry.com...]
My search engine that currently runs off six 2011 Mac Mini's in my home over a Comcast business connection. If you read the original page rank paper by Brin and Page they mention that they imagine in the future prices would come down so that pretty much anyone could do a web scale crawl. Currently, I would say the cost to do such a crawl is 3 to 4 thousand dollars of equipment and internet costs but excluding labor. My guess is this price will continue to fall in the future. It will probably be a few months before I do another crawl as I want to improve my indexer and web app.