|Xerox Launches FactSpotter Semantics-Based Enterprise Search Engine|
|Researchers at Xerox Corporation today unveiled FactSpotter, new document search software that goes beyond conventional "keyword" search, enabling it, in effect, to spot the one or two golden nuggets among the pebbles on the shore. |
"Our advanced search engine goes beyond today's typical 'keyword' search or current data-mining programs, which typically end up searching only 40 percent of all the documents that are relevant because the keywords are too limiting," said Frédérique Segond, manager of parsing and semantics research at XRCE. "Xerox's tool is more accurate because it delves into documents, extracting the concepts and the relationships among them. By 'understanding' the context, it returns the right information to the searcher, and it even highlights the exact location of the answer within the document."
It is fairly clear that it will not scale to WWW levels of tens of billions of pages: they gun for a niche of enterprise search market where number of documents is measured in millions rather than billions. Tough problems like dedicated keyword spamming is not really an issue there.
It is one (simple) thing to return Top 10 results from maybe 50-100k matches that you will have from collection of 1 mln documents, and it is completely different (very complex) problem to return same Top 10 but from a billion of qualifying pages. You can have a brilliant smart algorithm that works beautifully on 1 mln clean pages, but if you try to scale that to billions of dirty ones you will find that you need so much hardware that it will not be feasible for a long time.
FactSpotter is as alternative search engine insofar WWW is concerned as Google Desktop search is.
[edited by: Lord_Majestic at 5:09 pm (utc) on June 21, 2007]
It's an enterprise SE, so it's not meant to be a search engine in the sense of Google's, Microsoft's, Yahoo's & Ask's.
I agree, scalability is an issue with any search service.
Indeed, and as such it does not seem to fit the charter (which references only WWW search engines, even giving guideline of 20-50 mln pages minium index size) of this forum - even criterion #2 implies that the search engines in question should be "those that can drive traffic", which naturally enterprise search won't provide. But this is my view and you of course know better the rules, I won't be argueing about them :)