http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 38.103.63.17
register, login, search, glossary, subscribe, help, library, PubCon, announcements , recent posts, unanswered posts
Subscribe to WebmasterWorld
Home / Forums Index / Yahoo World / Yahoo Search Engine and Directory
Forum Library : Charter : Moderators: jcoronella & martinibuster

Yahoo Search Engine and Directory

  
Yahoo Webmap: Roughly 1 Trillion Links
Yahoo Implements Apache Hadoop To Process Webmap
engine


#:3581119
 2:33 pm on Feb. 21, 2008 (utc 0)

On a very related note, we're announcing today that we implemented what we believe is the world's largest commercial application of Apache Hadoop. We are now using Hadoop to process the Webmap -- the application which produces the index from the billions of pages crawled by Yahoo! Search.

Yahoo Implements Apache Hadoop To Process Webmap

More about Hadoop running in production on the Yahoo! Search Webmap

Some Webmap size data:

    Number of links between pages in the index: roughly 1 trillion links
      Size of output: over 300 TB, compressed!
        Number of cores used to run a single Map-Reduce job: over 10,000
          Raw disk used in the production cluster: over 5 Petabytes

          jimbeetle


          #:3581202
           3:59 pm on Feb. 21, 2008 (utc 0)

          Very interesting interview by Jeremy Zawadony of two of the Y! engineers (since Inktomi days) about the Y! search infrastructure on that second linked page.

          carguy84


          #:3581717
           12:47 am on Feb. 22, 2008 (utc 0)

          1 Trillion links and the best Yahoo can return is:

          select top 10 *
          from links
          where id = newId()

          lol

          ecmedia


          #:3582041
           2:03 pm on Feb. 22, 2008 (utc 0)

          Any time Yahoo tries to boast about how good it is, I go and do a search for some basic stuff -- disappointed yet again with nothing but crap. Hey Yahoo guys, you are in trouble either way. All by yourself you are useless and with Microsoft you are marrying another loser in the search business.

          blend27


          #:3582454
           9:58 pm on Feb. 22, 2008 (utc 0)

          I think this would double the quality and split the Trillion in half though:

          select *
          from links
          where id not in(select id from links where link is not like '%.info%')

          .

          DannyTweb


          #:3583034
           9:09 pm on Feb. 23, 2008 (utc 0)

          Yes, Maybe they can now clear old Inktomi penalties that they seem lost on how to go about doing as well.

           

          Home / Forums Index / Yahoo World / Yahoo Search Engine and Directory
          All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
          Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
          WebmasterWorld ® and PubCon ® are a Registered Trademarks of WebmasterWorld Inc.
          © WebmasterWorld Inc. / SearchEngineWorld 1996-2008 all rights reserved