Forum Moderators: Robert Charlton & goodroi
The supplemental collection of pages has been collected from the web just like the 3.3 billion pages in Google's main index.
Is it 3,300,000,000 (USA), or 3,300,000,000,000 (UK)?
Thanks
Delay between subsequent fetch: 2.5 seconds
Number of pages-per-day fetched by a single (and very lazy) thread: 34,560
Number of pages-per-day crawled by a single very cheap process(or) with 100 threads: 3,456,000
Number of days to crawl whole web using single process: 2894
Number of computers needed for whole-web monthly recrawling:
2894/30 = 96
Google has at least 10000 crawlers (hardware, CPUs)...
100 threads with 2.5 delay can run on a single CPU concurrently, crawl+outlink_extract+new_crawl+...+index; also some replication takes time, so I can have some percentage of mistake in calculations...
However difference between 96 and 10000 is huge, isn't it?
It could be: 100 crawlers, 100 parsers, 100 indexers, 100 PR Calculators, 100 replicators, and it could be less than 1000. Is BigDaddy really big? 100 millions websites, 1000 pages per site (in average)... 100 billions...
(multithreaded process with 2500 lazy threads is better than single-threaded just because there is always network delay such as 1.5-2 seconds of response time; let's forget tech. jargon... calculations remain the same).