Forum Moderators: open
On a very related note, we're announcing today that we implemented what we believe is the world's largest commercial application of Apache Hadoop. We are now using Hadoop to process the Webmap -- the application which produces the index from the billions of pages crawled by Yahoo! Search.Yahoo Implements Apache Hadoop To Process Webmap [ysearchblog.com]
More about Hadoop running in production on the Yahoo! Search Webmap [developer.yahoo.net]
Some Webmap size data:Number of links between pages in the index: roughly 1 trillion links
Size of output: over 300 TB, compressed!
Number of cores used to run a single Map-Reduce job: over 10,000
Raw disk used in the production cluster: over 5 Petabytes