Forum Moderators: open

Message Too Old, No Replies

remaining Google mysteries...

         

RoySpencer

11:42 pm on Jan 23, 2004 (gmt 0)

10+ Year Member



In my 8 long months as a parttime-webmaster-wannabe, I still have the following mysteries in the back of my noggin:
1) how much do they pay per day for bandwidth charges (it's gotta be google-bucks)
2) how much of their system resources are devoted to spidering versus page serving?
3) what's their biggest system concern: CPU? bandwidth? storage?
4) how DO they do a search of 3 billion pages in a fraction of a second?
5) how many CPUs are devoted to SERVING rearch result pages?

Inquiring minds wanna KNOW!

GoogleGuy

9:24 pm on Jan 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm. We've never really talked about 1&2. As far as #3, it makes sense that you want to optimize for all three, right? There's no point in having lots of computers if you don't take advantage of everything they can offer, so it's a bit of a balancing act. If you'd like to know more, I think this thread points to a recent useful article:
[webmasterworld.com...]

TheDave

10:38 am on Jan 25, 2004 (gmt 0)

10+ Year Member



There's some info on the Google File System here

[cs.rochester.edu...]

Heavy reading, but I was glad that I got through it and understood about 80%, enough to get a good idea of how part of google works (the Google File System) :)

Chico_Loco

10:03 pm on Jan 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



C'mon GoogleGuy, at least give us a partial answer to No.1 :) It is very interesting for most webmasters.

I don't actually want to know what you PAY for the bandwidth (that doesn't interest me so much), but I'd love to know at least an approx figure on the total bandwidth transfer. It's hard to even guess. Are we talking

a) 0 - 100GB/day
b) 100GB - 1TB/day
c) 1 - 10TB/day
d) 10 - 100TB/day
e) Over 100TB/day

Pretty Please? :)

dmorison

11:12 pm on Jan 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From [google.com...]

By accessing its index of more than 3 billion web pages, Google delivers relevant results to users all over the world, typically in less than half a second. Today, Google responds to more than 200 million search queries per day.

Ok, calc.exe out.

Google homepage is about 50 bytes (HTTP) + 4198 bytes (HTML) + 8557 bytes (Google logo) = 12805 bytes. TCP/IP overhead is about 0.4%, so call it 13317 bytes(A).

The first page of results, showing 10 listings and 8 AdWords works out at 50 bytes (HTTP) + 28834 bytes (HTML) + 8800 bytes (images) = 37684 * 1.04 = 39191 bytes(B).

Now Google serves queries from all over the place; not every one will spawn from the homepage; and some will proceed to additional pages of the results. Many will come directly in as a GET request. Some will be more, some will be less, and the images should be well cached by the big ISP's proxy servers - so that will help. Even so - as a very rough estimate - we can just take that total and multiply by their own numbers:

((A)+(B)) * 200,000,000

= 52508 * 200,000,000

= 10,501,600,000,000 bytes (or about 10.5 Petabytes) just serving queries.

DNS is handled by Akamai, but you've still got to take into account spidering, ad serving, click-tracking and GoogleGuy surfing WebmasterWorld...

[Mod note per member request: That's Terabytes, not Petabytes.]

[edited by: Marcia at 10:31 am (utc) on Jan. 26, 2004]
[edit reason] Member request. [/edit]

BBB_Guy

11:20 pm on Jan 25, 2004 (gmt 0)

10+ Year Member



>> GoogleGuy surfing WebmasterWorld...

Thats where the main resources are going...

danny

1:03 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think that's actually only 10 terabytes, not 10 petabytes.

shrirch

1:07 am on Jan 26, 2004 (gmt 0)

10+ Year Member



>> about 10.5 Petabytes) just serving queries

A large number of these would be to partners through dedicated connections, with (i'd assume) barebones XML (if there is such a thing) and compression / caching hardware in play.

JudgeJeffries

1:55 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hell....and I thought lawyers were boring!

Powdork

5:03 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first page of results, showing 10 listings and 8 AdWords works out at 50 bytes (HTTP) + 28834 bytes (HTML) + 8800 bytes (images) = 37684 * 1.04 = 39191 bytes(B).
That was pre-florida. Now you've got webmasters setting their results per page to 100 and viewing the first ten pages trying to find their sites.;)

dmorison

6:38 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think that's actually only 10 terabytes, not 10 petabytes.

Of course it is; standing corrected. It was 11PM on a Sunday....

Powdork

6:41 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



petabytes? Sounds like food for abused dogs.:)

gerwin

8:20 am on Jan 26, 2004 (gmt 0)

10+ Year Member



10 Petabytes? Holy Glow, they must really make a alot of money to pay that lot of traffic and we are not counting the servers that are located al over the world, harddisk space, and traffic that Googlebots are generating.

dmorison

8:54 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's Tera - not Peta!

Mods - any chance you could edit the original and leave a note to that effect?

Herenvardo

9:47 am on Jan 26, 2004 (gmt 0)

10+ Year Member



10,501,600,000,000 bytes = 10,255,468,750 kilobytes = 10,015,106 megabytes = 9,780 gigabytes = 9.55 terabytes = 0,009327 petabytes.
There are tera and not peta, and remember than when speaking about bytes, conversion factors are 1024 instead of 1000, so it is less than 10Tb.

petabytes? Sounds like food for abused dogs.

In the International System of units, there are some prefixes to define multiples and submultiples of units. Since submultiples of bytes would be nonsense (only the bits), there are used only the multiples:
kilo (k) = x 1000
1 mega (M) = 1000 kilo
1 giga (G) = 1000 mega
1 tera (T) = 1000 giga
1 peta (P) = 1000 tera

1 exa (E) = 1000 peta
1 zetta (Z) = 1000 exa
1 yotta (Y) = 1000 zetta ... and there are more...
When talking about bits, the 1000 factors are replaced by 1024 (2 to the 10th power), 'cos it's easier for computers to make these conversions. Of course, some of these are nonsense in the field of informatics, but they could be applied to any unit. Have somebody a 80Yb hardisk or a 233ZHz cpu? ;)

Greetings,
Herenvardö

muesli

10:04 am on Jan 26, 2004 (gmt 0)

10+ Year Member



so google pays a bandwidth bill of about 0.3 petabytes a month.

googleguy, may i propose something?

you change the google source from "bgcolor=#ffffff" to "bgcolor=white" (2 characters less) and wire me the saved amount of bandwith costs. sticky me for my bank account details.

;-)

(ps. there's still a lot of potential. what about a google source code optimization contest here on WebmasterWorld?)

Herenvardo

10:22 am on Jan 26, 2004 (gmt 0)

10+ Year Member



1) how much do they pay per day for bandwidth charges (it's gotta be google-bucks)
2) how much of their system resources are devoted to spidering versus page serving?
3) what's their biggest system concern: CPU? bandwidth? storage?
4) how DO they do a search of 3 billion pages in a fraction of a second?
5) how many CPUs are devoted to SERVING rearch result pages?

1) It could be something around 200,000$/month
2) We could calculate: multiply the resources needed to give a search result by the total of searches (200million) and compare with the product of resources needed to spider a page by 3000million (3billion for americans): there are some pages spidered many times a day, and others spidered only once a month or so: you'll find that are similar results.
3) Somebody who can spend so much money in bandwith, can have very powerfull computers. As GG posted, it's logical to have them well balanced
4) "Have somebody a 80Yb hardisk or a 233ZHz cpu?" maybe both questions have the same answer ;)
5) Only G nows. Maybe there are 20, or maybe there are 2000. But you can be sure that there are a lot!

Greetings,
Herenvardö

victor

10:25 am on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you change the google source from "bgcolor=#ffffff" to "bgcolor=white"

I want half of what muesli gets for suggesting "bgcolor=#fff" -- that's another octet saved :)

Herenvardo

10:28 am on Jan 26, 2004 (gmt 0)

10+ Year Member



ups! It's me again! :P
you change the google source from "bgcolor=#ffffff" to "bgcolor=white" (2 characters less) and wire me the saved amount of bandwith costs. sticky me for my bank account details.

2x200million results served a day = 4Mbytes/day saved = 120Mbytes/month. It's not so much. And GG has sticky mail disabled, 150k wouldn't last for the mail GG would receive if this was enabled.

Why not to remove completely the bgcolor and text color atributes and use user's default settings? It would save some 20~30 octets!

Greetings,
Herenvardö

muesli

12:14 pm on Jan 26, 2004 (gmt 0)

10+ Year Member



2x200million results served a day = 4Mbytes/day saved = 120Mbytes/month. It's not so much.

you are mistaken (or am i?):

2x200million is 400 Mbytes = 1.2 Gbytes a month. as one query is at least two, rather three PIs, this makes 3.6 Gbytes a month, by just two characters less.

take victors suggestion and lots of others we haven't made yet and you can easily save google AND their users more than dozens of GB traffic per month. then multiply that by the factor the number of queries grow each month. a terabyte is probably doable for the next 12 months.

ok, savings would still probably not exceed $1000, but hey, pages would load quicker and server ressource allocation would be (slightly) relieved. worth it, i'd say.

Essex_boy

1:13 pm on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Argggh! Modern day trainspotters!

Just accept the fact that its a lot what ever the figures are.

Chico_Loco

4:04 pm on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I have to jump in here just in case... Just remove the bgcolor="#FFFFFF" altogether. All browsers give a white background when one is not specified. In fact, you can also remove the text="000000" too as black is the default :):) Wow, how'd you miss that one?

moltar

4:44 pm on Jan 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I beleive browsers of third generation have gray background color by default.

finer9

5:00 pm on Jan 26, 2004 (gmt 0)

10+ Year Member



Can't help but think about 'Office Space' during this thread....if you just take the rounded-off pennies and funnel them to an account...hehe

muesli

5:19 pm on Jan 26, 2004 (gmt 0)

10+ Year Member



Chico_Loco:
Just remove the bgcolor="#FFFFFF" altogether. All browsers give a white background when one is not specified. In fact, you can also remove the text="000000" too as black is the default :):) Wow, how'd you miss that one?

wasn't trying to give serious suggestions on how to reduce google's page sizes (there is much more to it than those body-tag tweaks). i wanted to point out the relatively huge effect such reductions would have (say, to the internet), by giving an example from elementary school.

so, is this the "WebmasterWorld google source code optimization contest" thread already?

;-)

Chndru

2:08 pm on Jan 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dmorison,
your calculations miss my favorite search, Image search with 100 search results/page. That outta suck the bandwidth. :)

Essel

2:29 pm on Jan 29, 2004 (gmt 0)

10+ Year Member



Are the images not served by the webserver the image is on, not by google?

Herenvardo

9:36 am on Jan 30, 2004 (gmt 0)

10+ Year Member



you are mistaken (or am i?):

2x200million is 400 Mbytes = 1.2 Gbytes a month. as one query is at least two, rather three PIs, this makes 3.6 Gbytes a month, by just two characters less.

Yes, I was mistaken. Sorry :(
But even so, this would be a saving of 0.036% of bandwith... I don't know if it would be trully worth...

Let G manage its finances, I'm here to help revealing the remaining Google mysteries ;)
In any case, using default colors and removing these attributes at all seems to be the most save on bandwith by reducing the code file size.
But it wouldn't be a greater saving if browsers had a built-in zip-unzip utility? The files are sent zipped through the line and the client unzips... less bandwith at all... It's only a suggestion for browser makers ;)

Grtngs,
Hrnvrdö (ups, I zipped my signature ;))

Brett_Tabke

11:05 am on Jan 30, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



TheDave! Nice link dude. We've been looking for that PDF for along time. It came through the forums a few years ago, and we lost it! Thanks!

[cs.rochester.edu...]