How big is google?

Forum Moderators: open

Message Too Old, No Replies

How big is google?

how many terabytes

xcandyman

1:09 pm on May 8, 2003 (gmt 0)

I have heard that they handle 100's of terabytes of data has anyone got more of an exact'ish kind of figure.

GG?

Thanks

Steve

takagi

1:53 pm on May 8, 2003 (gmt 0)

Can't find the number of terabytes, but in the Google Revenue to skyrocket to $750million this year. [webmasterworld.com] thread you can find the following information about Google's hardware collection

...said it consists of more than 54,000 servers designed by Google engineers from basic components. It contains about 100,000 processors and 261,000 disks,

creative craig

1:55 pm on May 8, 2003 (gmt 0)

Isnt it the biggest Linux cluster in the world?

Craig

creative craig

1:58 pm on May 8, 2003 (gmt 0)

I found this on the Google site:

www.google.com/press/highlights.html

Gives a good run down of their technical highlights :)

Craig

Critter

2:04 pm on May 8, 2003 (gmt 0)

Let's do the math :)

We're working with 3,083,324,652 pages.

The cache of pages, assuming that each page is average 4K and each is compressed using gzip (50% compression typical for text) brings us to 5.74 terabytes.

The forward index is going to be about 60% of this size, and so is the backward index--so we have another 3.44 terabytes for each index = 6.88 terabytes.

Custom indexes like title tag and heading indexes, as well as domain/url indexes (for link/allinurl) are going to be substantially smaller...let's say 10% of the compressed cache = 0.57 terabytes.

And, of course, to be conservative we'll double our total; cause you never know what Google's up to :)

Grand total estimate: 26.38 terabytes. Which, by the way, can be comfortably housed in *one* NetApp NAS cabinet.

Peter

creative craig

2:10 pm on May 8, 2003 (gmt 0)

This page from Stanford may help you, with how they store data and how it is used and called for a search:

The anatomy of a search engine [www-db.stanford.edu]

Craig

takagi

2:11 pm on May 8, 2003 (gmt 0)

What about the 700 million Usenet messages and the 425 million images?

xcandyman

2:21 pm on May 8, 2003 (gmt 0)

On their jobs page they state on one job:

Building large-scale distributed file systems and other infrastructure that makes it possible to reliably and efficiently manage and process hundreds of terabytes of information.

Thats where I got the hundreds from.

Steve

creative craig

2:23 pm on May 8, 2003 (gmt 0)

You only gave then 26 terabytes of data though, not hundreds :)

Craig

Critter

2:29 pm on May 8, 2003 (gmt 0)

Add in the usenet stuff, and the images (which are resampled to be smaller) and I'll still only give another 10 terabytes or so.

The article didn't seem to definately say that Google *had* 100s of terabytes, just that they were building the infrastructure to handle it.

Remember as well, that if this is duplicated across 8 datacenters or so that we'll have to multiply (30 terabytes x 8 = 240 terabytes).

But as far as non-duplicated data probably only 20-30 terabytes at this time.

Guys...you got to remember that a terabyte is a LOT of information

Peter

brotherhood of LAN

2:35 pm on May 8, 2003 (gmt 0)

What if they were to check pages for duplication content....Is the web really only a couple of megs big? ;)

If they ordered the pages on similarity and just encoded the difference between them I see those terrabytes becoming more manageable. Hats off to them anyway, they do a good job of sorting alot of information they have never read or heard of!

xcandyman

2:36 pm on May 8, 2003 (gmt 0)

You have got to remember that all the other data they store which is not available via the search all the data on banned sites. I bet they have information on nearly every static page on the internet.

The mind boggles. How about some stats GG?

Steve

creative craig

2:40 pm on May 8, 2003 (gmt 0)

I have just been reading the page I quoted, Anatomy of a search engine, we need a fresher version as that paper is based on a 24 million page index, and the index has grown some what since then. ;)

The amount of info in that article is enough to keep any interested SEO busy for a month.

Craig

takagi

2:50 pm on May 8, 2003 (gmt 0)

In the thread started by GoogleGuy to inform webmasters about filtering expired domains [webmasterworld.com] two months ago, he wrote in message 19

.. we're using multiple sources of data stretching back to 2000 in order to cross-check. No one should get caught accidently.

So xcandyman is right about Google having a lot more information about (static) pages.

xcandyman

2:53 pm on May 8, 2003 (gmt 0)

Just been looking into size and speed and came accross what I want as a new connection instead of my dsl

Dense Wave Division Multiplexing ( DWDM )
Multiple data signals carried on different wavelengths of light.

10.900 Tbps or 10900000 Mbps

It would take 2 seconds to transfer 2.5 terabytes on this baby

Drool :)

How big is google?

how many terabytes

xcandyman

takagi

creative craig

creative craig

Critter

creative craig

takagi

xcandyman

creative craig

Critter

brotherhood of LAN

xcandyman

creative craig

takagi

xcandyman

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week