---- Google's Amit Singhal Interview: Developing The Knowledge Graph
ergophobe - 4:52 pm on Feb 14, 2012 (gmt 0)
>> pure text match btw?
It doesn't and can't differentiate between different meanings of the same word, it doesn't understand context, it can't find articles on iMac computers if I search on Mac computer or Apple computer. Pure text match is handy to have as a option ("verbatim") which I use a lot (I may actually be looking for misspelling of a word), but it's not usually what you want.
Long term, what I want is the Star Trek computer.
Me: Computer? Computer: Working. Me: Which planets are within 20,000 light years and have a gaseous atmosphere with an atmospheric nitrogen concentration of higher than 18% with an atmospheric methane concentration of lower than 22%? Computer: There are seven such planets. The closest one is Aldebran II-a, 256 light years away with an atmospheric....
Not only is this what I *want* long term, I'm quite sure that something like this (though with a much better interface) is what I'll *get* long term. Or at least the kids in college now will live to see this.
Right now, experienced searchers might not even formulate a search such as "the ten deepest lakes in the US" but would probably come at this idea from related searches that stand a better chance of finding a text match
Actually Ted, I think I disagree, though I'm not 100% certain I know what you mean. I think of myself as an experienced searcher, but over the last couple of years, what I have found is that more and more my searches are adapting to natural language searches with articles and pronouns and so forth.
I'll give you an example from this morning. I wanted to find the OBD port for a 2004 Subaru Forester. The thing is, I didn't have the word "port" and my first old-style keyword search failed. So I fell back on what I consider my new strategy. I typed in my query as if I were talking to a human mechanic and typed in "Where do I plug an OBD reader into a Subaru Forester?" Yes, of course I know half of those are stop words, but guess what? An appropriate page, with a diagram and everything, popped up #1.
I have deep uneasiness about what Google knows about me (and Facebook and Visa and US Bank and REI for that matter), but I think search technology is noticeable improved from five years ago and by a lot. The perceived change is much less obvious because if the search tech is 3x better, the SEs are faced with a web that is growing much more rapidly. Estimates vary, but if we believe Google's official statements [source 1] about the number of unique URLs they have indexed (which is not the number of unique pages, of course):
Number of pages in Google index: 1999: 26 million 2000: 1 billion 2008: 1 trillion
So yeah, the SERPs often seem polluted with irrelevant and overlapping garbage. The problem is there is massively more noise in the system than there was in 2000 and so the filters have to be massively better. In fact, what they are is only considerably better, not massively, game-changing better.
So all of this as a round about way of saying - this is why you change something that's working. What works when seaching 26 million URLs simply won't give a reasonable result with 1 trillion unique URLs.
1. Source: [googleblog.blogspot.com...] In 2005, the "Surface Web" (i.e. the indexable web) was 11.5 billion pages.