Forum Moderators: open
Inquiring minds wanna KNOW!
[cs.rochester.edu...]
Heavy reading, but I was glad that I got through it and understood about 80%, enough to get a good idea of how part of google works (the Google File System) :)
I don't actually want to know what you PAY for the bandwidth (that doesn't interest me so much), but I'd love to know at least an approx figure on the total bandwidth transfer. It's hard to even guess. Are we talking
a) 0 - 100GB/day
b) 100GB - 1TB/day
c) 1 - 10TB/day
d) 10 - 100TB/day
e) Over 100TB/day
Pretty Please? :)
By accessing its index of more than 3 billion web pages, Google delivers relevant results to users all over the world, typically in less than half a second. Today, Google responds to more than 200 million search queries per day.
Ok, calc.exe out.
Google homepage is about 50 bytes (HTTP) + 4198 bytes (HTML) + 8557 bytes (Google logo) = 12805 bytes. TCP/IP overhead is about 0.4%, so call it 13317 bytes(A).
The first page of results, showing 10 listings and 8 AdWords works out at 50 bytes (HTTP) + 28834 bytes (HTML) + 8800 bytes (images) = 37684 * 1.04 = 39191 bytes(B).
Now Google serves queries from all over the place; not every one will spawn from the homepage; and some will proceed to additional pages of the results. Many will come directly in as a GET request. Some will be more, some will be less, and the images should be well cached by the big ISP's proxy servers - so that will help. Even so - as a very rough estimate - we can just take that total and multiply by their own numbers:
((A)+(B)) * 200,000,000
= 52508 * 200,000,000
= 10,501,600,000,000 bytes (or about 10.5 Petabytes) just serving queries.
DNS is handled by Akamai, but you've still got to take into account spidering, ad serving, click-tracking and GoogleGuy surfing WebmasterWorld...
[Mod note per member request: That's Terabytes, not Petabytes.]
[edited by: Marcia at 10:31 am (utc) on Jan. 26, 2004]
[edit reason] Member request. [/edit]
The first page of results, showing 10 listings and 8 AdWords works out at 50 bytes (HTTP) + 28834 bytes (HTML) + 8800 bytes (images) = 37684 * 1.04 = 39191 bytes(B).That was pre-florida. Now you've got webmasters setting their results per page to 100 and viewing the first ten pages trying to find their sites.;)
petabytes? Sounds like food for abused dogs.
Greetings,
Herenvardö
googleguy, may i propose something?
you change the google source from "bgcolor=#ffffff" to "bgcolor=white" (2 characters less) and wire me the saved amount of bandwith costs. sticky me for my bank account details.
;-)
(ps. there's still a lot of potential. what about a google source code optimization contest here on WebmasterWorld?)
1) how much do they pay per day for bandwidth charges (it's gotta be google-bucks)
2) how much of their system resources are devoted to spidering versus page serving?
3) what's their biggest system concern: CPU? bandwidth? storage?
4) how DO they do a search of 3 billion pages in a fraction of a second?
5) how many CPUs are devoted to SERVING rearch result pages?
1) It could be something around 200,000$/month
2) We could calculate: multiply the resources needed to give a search result by the total of searches (200million) and compare with the product of resources needed to spider a page by 3000million (3billion for americans): there are some pages spidered many times a day, and others spidered only once a month or so: you'll find that are similar results.
3) Somebody who can spend so much money in bandwith, can have very powerfull computers. As GG posted, it's logical to have them well balanced
4) "Have somebody a 80Yb hardisk or a 233ZHz cpu?" maybe both questions have the same answer ;)
5) Only G nows. Maybe there are 20, or maybe there are 2000. But you can be sure that there are a lot!
Greetings,
Herenvardö
you change the google source from "bgcolor=#ffffff" to "bgcolor=white" (2 characters less) and wire me the saved amount of bandwith costs. sticky me for my bank account details.
Why not to remove completely the bgcolor and text color atributes and use user's default settings? It would save some 20~30 octets!
Greetings,
Herenvardö
2x200million results served a day = 4Mbytes/day saved = 120Mbytes/month. It's not so much.
you are mistaken (or am i?):
2x200million is 400 Mbytes = 1.2 Gbytes a month. as one query is at least two, rather three PIs, this makes 3.6 Gbytes a month, by just two characters less.
take victors suggestion and lots of others we haven't made yet and you can easily save google AND their users more than dozens of GB traffic per month. then multiply that by the factor the number of queries grow each month. a terabyte is probably doable for the next 12 months.
ok, savings would still probably not exceed $1000, but hey, pages would load quicker and server ressource allocation would be (slightly) relieved. worth it, i'd say.
Just remove the bgcolor="#FFFFFF" altogether. All browsers give a white background when one is not specified. In fact, you can also remove the text="000000" too as black is the default :):) Wow, how'd you miss that one?
wasn't trying to give serious suggestions on how to reduce google's page sizes (there is much more to it than those body-tag tweaks). i wanted to point out the relatively huge effect such reductions would have (say, to the internet), by giving an example from elementary school.
so, is this the "WebmasterWorld google source code optimization contest" thread already?
;-)
you are mistaken (or am i?):2x200million is 400 Mbytes = 1.2 Gbytes a month. as one query is at least two, rather three PIs, this makes 3.6 Gbytes a month, by just two characters less.
Yes, I was mistaken. Sorry :(
But even so, this would be a saving of 0.036% of bandwith... I don't know if it would be trully worth...
Let G manage its finances, I'm here to help revealing the remaining Google mysteries ;)
In any case, using default colors and removing these attributes at all seems to be the most save on bandwith by reducing the code file size.
But it wouldn't be a greater saving if browsers had a built-in zip-unzip utility? The files are sent zipped through the line and the client unzips... less bandwith at all... It's only a suggestion for browser makers ;)
Grtngs,
Hrnvrdö (ups, I zipped my signature ;))
[cs.rochester.edu...]