|20% to 25% of queries are new each day|
From Udi Manber - VP of Engineering at Google
Here's a fascinating bit of information, taken from Udi Manber's recent talk at Supernova - "Search is a Hard Problem". Udi is Google's VP of Engineering
That is pretty mind blowing when you think about the challenge it implies to all search engines. And here we sit, all focused on those high volume searches!
The article has some other interesting points, but this one just jumped out. Thanks to Barry Schwartz at SearchEngineLand for [url=http://searchengineland.com/070622-085337.php]tipping us off to the ReadWriteWeb article. [readwriteweb.com]
It's pretty easy from where we sit to criticize Google (or Ask, Yahoo, MSN, whoever), but that's often like being a Monday morning quarterback. I've had my hand in some corporate "Knowledge Base" search and also site search for large domains. That alone has been enough to give me great respect for what it takes.
And just imagine trying to coordinate all those PhDs and other engineers! That first page of ten search results looks like such a simple thing, but what must go on behind the scenes to create it. Sheesh!
This talk by Udi Manber reminds me of another paper from Googler Anna Lynn Patterson, Why Writing Your Own Search Engine is Hard [acmqueue.com].
|20 to 25% of the queries we see today, we have never seen before |
That just seems like an astonishingly high percentage.
I'm trying to wrap my head around what a number like that means when it comes to writing content. Of course some of those new queries will be related to current events in the news. But that probably leaves more than a few that relate to existing content on the web.
How to capture those?
In order to make sense of that number, I have to think that it's 25% with each search term weighted as 1, and not counted however many times they see for those particluar words. What I'm thinking is this -- even if a query is seen 1 billion times in a day, it's still just one query.
I don't know that there's any specific approach to target those search terms, except for the same things you would do to attract long tail terms and watch your logs for remarkably new hits.
A high percentage of these new queries are probably the obscure terms or the long tails created by various keyword combinations, new domains and names, etc.
Every other day when I check something in the SERPs and run into trouble I also check some unlikely combinations that have little or no meaning, or perhaps no value to any user or webmaster.
Since the SERPs are weighed based on word order, number of words, if you add a non-competitive word in front, in the middle or in the back of a competitive phrase, you'll get a control group which will help you pinpoint the problems.
Out of 10 searches I make for SEO, 9 are like this.
On the other hand, when I just do a simple web search, I often get results so bad that I need to make it so obscure, so long, so detailed that it don't direct me to the first spammy forum. I mostly make technical related searches. While three years back copy pasting an error message I've never seen before or entering just the keywords, phrases would get me relevant results and solutions, nowadays it's increasingly harder to dig deep into on-topic discussions and technical advice. Because trusted comes first ( too generic ), spammy comes second ( no info ), official comes third ( sometimes helps, sometimes doesn't )...
So yes, I can imagine that at LEAST 20-25% of every search is brand new.
The work of half a million "SEOs" in an increasingly over-populated and filtered market, and Google - with every update - inching closer to become a generic directory instead of the expert SE it was before. Let them steer away from that path as soon as they find the balance to do so.
Perhaps the best way to understand the difficulty of Google's problem is to interview for a job there. I've done so several times, but have had no luck so far, largely because my degree isn't in computer science - all of my programming is self-taught.
Computer science isn't really about programming, but is more a branch of mathematics in which one calculates the time and memory required to perform calculations involving very large data sets.
Software engineering, for the most part, is quite unrelated to that; it has been very rare in my twenty-year career that I've had to employ algorithm analysis in my work.
But Google, unlike the vast majority of software companies, has to deal with enormous data sets for just about everything it does.
Typical interview questions involve figuring out how to do things when the data won't all fit in memory, or how to distribute a problem across thousands of computers in a way that the load is shared equally by each.
To understand Google's computational problem: for any search engine to do its work, it has to store the entire World Wide Web on its internal hard drives, and analyze the lot of it for the answer to each query it gets.