TypicalSurfer - 6:24 pm on Oct 6, 2012 (gmt 0)
Actually you are correct. The biggest challenge to large scale document retrieval is getting structured documents. The knee-jerk reaction to google is always "what a great search engine" but truth be told a lot of that credit rightfully belongs to the content creators for structuring their documents in a way that overcomes a search engines biggest challenge.
This is why I maintain that significant competition in the search space will come sooner rather than later, it's ridiculously easy to make sense of the web now. If you were to run a web crawler ten years ago, you'd be hard pressed to make sense of what you crawled, lack of proper titles, document names, etc. were the rule not the exception. It's a different world now.