| 11:02 pm on Mar 31, 2003 (gmt 0)|
From their jobs site [google.com...] it seems they look for C++ plus Python (occasionally perl) skills.
| 1:20 am on Apr 1, 2003 (gmt 0)|
From what I've read the core stuff is all C/C++, some of the prototyping stuff is Python/Perl as well as some not-so-heavily used back-end stuff.
| 1:23 am on Apr 1, 2003 (gmt 0)|
I wanna know what kind of database they use..custom made? Object Oriented? Self-created?
| 3:25 am on Apr 1, 2003 (gmt 0)|
There was an article several years ago that we believe has been pulled from the web that laid out Googles back end system from a macro standpoint. If anyone knows the article I speak of - please drop a url.
Memory is fading of it, but at the heart was a custom file system and tweaked version of linux. The custom file system allowed single files to span the entire disk with random access allowed. At the time, it spoke of the entire index fitting on a single 80 gig drive.
| 5:10 am on Apr 1, 2003 (gmt 0)|
Love the title of this thread especially the bit about handcoded machine code. Do people still do that sort of thing?
Remember hand coding 6502 codes on an atari using Rodnay Zaks excellent book. My folks were too tight to buy the assembler cartridge.
| 5:17 am on Apr 1, 2003 (gmt 0)|
On a more serious note the database likely uses some excellent caching techniques to assist performance. Because the major updates are once every x weeks, and freshbot does not alter the index that much then caching is a viable solution to ensure search performance.
One of the sites I work on has object caching in the java layer with a staleness flush so after X minutes the cache slowly empties (although each object has a slight randomness built in to prevent giant all at once flushes).
The main area where we get performance gains is from repeat searches, and from people who want to page through the data. Paging used to be slow, now it just rocks along.
Would not be surprised to see some of the same tactics being employed. The aim of course is not so much "speed" but more "consistency". Better to have a consistent 3 or 4 second return time than a number of pages take 1 or 2 seconds and then one transaction take say 10 or 15 seconds. That's way more annoying for people.
Couple consistent speed expectations with some caching, and interval updates and it'll help you manage a great deal of content and provide it up in reasonable times. IMHO.
| 5:26 am on Apr 1, 2003 (gmt 0)|
So GOOGLE doesn't use M$ OS? Why am i not surprised?
| 5:38 am on Apr 1, 2003 (gmt 0)|
>I wanna know what kind of database they use..custom made?
Sounds 100% custom - that's what the file system is all about.
The only major search engine I know of that uses an off-the-shelf db is Wisenut.
| 5:43 am on Apr 1, 2003 (gmt 0)|
|If you are interested in the database of google, check out |
http://www.aspseek.org/man/aspseek.7.php - It's a google clone, I use it on my website and it rocks! It's coded in C++ with a combination c++ and mysql database.
An article explaining google's internals might explain how this search engine gets such google like accuracy, they link to the paper "The Anatomy of a Search Engine" http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm by Sergey Brin and Lawrence Page which you may find interesting ( although I don't think this is what Brett was talking about )
| 5:53 am on Apr 1, 2003 (gmt 0)|