Forum Moderators: open
I'm guessing that poor search times are non-repeatable (since common results must get cached), but it'd be interesting to hear what really gets those pigeons pecking.
I would, however, be curious about what type of searches take longest for the servers to process, but I don't think the number on the SERPS really reflects that too accurately.
If you go into the image search, you can hit 3-4 seconds regularly during peak hours. I justed searched for "widgets.gif [images.google.com]" in the image search with filtering off and it was 12 seconds to tell me it didn't find anything (repeated several times with near same results and a minute later it was instantaeous).
Complicated Usenet searches can also take 4-5 during the day too.
They use clusters of index servers, to distribute the query across clusters of smaller computers containing the actual data.
With over 2.8 billion WebPages in its DB, most queries are returned in under a second, which even with unlimited resources, is am extremely impressive achievement. And has played a major role in Google’s success.
Clusters are not all that impressive any more - fairly routine tech this days. The glue that keeps the whole thing together is interesting though. Just once, I'd like to hear a tech give us a "tell all" about the whole system. We've heard the macro picture from the users perspective end at the site, and we've heard some minutia about data centers and actual hardware, but we've never heard a good overview of the whole system and how it glues together.
If we get down to the box, the part I'd like to hear more about is on the mechnics of the data storage and retrevial end. Google has talked in one interview about their proprietary full disk spanning file system (random/relative access). I believe it was with Larry or might have been Craig. That interview is no longer on the web (or is it?). That would be fascinating to hear more about.
sorry to hijack your thread slud
About clusters: there are a few good talks floating around the web about architectures for distributed computing. Of course, it never hurts to have 10,000 computers to run your code. :)
I don’t work for Google, I am just a big fan of technology.
Also, having gone to the rival University of the founders of Google, makes me even more curious :)
I actually just left the grind of corporate America, and if I make any $$, will be extreamly happy. I worked for a multi-billion $$ financial institution, and was sick of my dense, fancy-degreed bosses :). I am now using the internet as a medium for a new business.
Back to clustering, clustering has been around since the olden days of computing, but IMHO the last couple of years the performance of these computing clusters has gone through the roof.
Having also worked for a major online corp, with over 20 million unique records, containg text fields. I know how challenging deleivering Sub-Second responses to full-text queries is, but I also know it is an absolute necessity. Sites like Google, along with WW, and all the major players on the net, have set the bar.
And IMO, any site wishing to be a major player, must return sub-second responses.
Thanks for kind responses guys,
Paul
Paul