Forum Moderators: open
...said it consists of more than 54,000 servers designed by Google engineers from basic components. It contains about 100,000 processors and 261,000 disks,
We're working with 3,083,324,652 pages.
The cache of pages, assuming that each page is average 4K and each is compressed using gzip (50% compression typical for text) brings us to 5.74 terabytes.
The forward index is going to be about 60% of this size, and so is the backward index--so we have another 3.44 terabytes for each index = 6.88 terabytes.
Custom indexes like title tag and heading indexes, as well as domain/url indexes (for link/allinurl) are going to be substantially smaller...let's say 10% of the compressed cache = 0.57 terabytes.
And, of course, to be conservative we'll double our total; cause you never know what Google's up to :)
Grand total estimate: 26.38 terabytes. Which, by the way, can be comfortably housed in *one* NetApp NAS cabinet.
Peter
The anatomy of a search engine [www-db.stanford.edu]
Craig
The article didn't seem to definately say that Google *had* 100s of terabytes, just that they were building the infrastructure to handle it.
Remember as well, that if this is duplicated across 8 datacenters or so that we'll have to multiply (30 terabytes x 8 = 240 terabytes).
But as far as non-duplicated data probably only 20-30 terabytes at this time.
Guys...you got to remember that a terabyte is a LOT of information
Peter
If they ordered the pages on similarity and just encoded the difference between them I see those terrabytes becoming more manageable. Hats off to them anyway, they do a good job of sorting alot of information they have never read or heard of!
.. we're using multiple sources of data stretching back to 2000 in order to cross-check. No one should get caught accidently.
So xcandyman is right about Google having a lot more information about (static) pages.
Dense Wave Division Multiplexing ( DWDM )
Multiple data signals carried on different wavelengths of light.
10.900 Tbps or 10900000 Mbps
It would take 2 seconds to transfer 2.5 terabytes on this baby
Drool :)