I've thought long and hard on this subject and the challenges for a small start up with a very limited budget would make such a project practically unachievable. 15 years ago it would have been a lot easier. Crawling the web and keeping that content fresh is a huge task in of itself.
Rather than try to be the next Google, why not aggregate the web (like the Internet Archive) using bots and other sources of data, then lease that data back to other small start ups. It could be an open source project of some sort. This would allow search engine start ups to focus their entire budgets on search algorithms (instead of having to utilize costly and time consuming resources for crawling). Eventually, every site on the web could have it's own custom refined search engine and there would be no need for Google.