Welcome to WebmasterWorld Guest from 184.108.40.206
Forum Moderators: open
Memory is fading of it, but at the heart was a custom file system and tweaked version of linux. The custom file system allowed single files to span the entire disk with random access allowed. At the time, it spoke of the entire index fitting on a single 80 gig drive.
One of the sites I work on has object caching in the java layer with a staleness flush so after X minutes the cache slowly empties (although each object has a slight randomness built in to prevent giant all at once flushes).
The main area where we get performance gains is from repeat searches, and from people who want to page through the data. Paging used to be slow, now it just rocks along.
Would not be surprised to see some of the same tactics being employed. The aim of course is not so much "speed" but more "consistency". Better to have a consistent 3 or 4 second return time than a number of pages take 1 or 2 seconds and then one transaction take say 10 or 15 seconds. That's way more annoying for people.
Couple consistent speed expectations with some caching, and interval updates and it'll help you manage a great deal of content and provide it up in reasonable times. IMHO.
An article explaining google's internals might explain how this search engine gets such google like accuracy, they link to the paper "The Anatomy of a Search Engine" http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm by Sergey Brin and Lawrence Page which you may find interesting ( although I don't think this is what Brett was talking about )