homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

Common Crawl - now everyone can be Google
a way to access an index created to make the web more open

5+ Year Member

Msg#: 4387151 posted 2:19 pm on Nov 15, 2011 (gmt 0)

Common Crawl Foundation provides another opportunity for Google challengers:

Google’s stranglehold on search information is seen by many as being contrary to the Web’s ethos of freely available information and openness. Now we have a way to access an index created to make the web more open. The new index has been announced by the Common Crawl Foundation.

The Foundation says:
"Common Crawl is a Web Scale crawl, and as such, each version of our crawl contains billions of documents from the various sites that we are successfully able to crawl. This dataset can be tens of terabytes in size, making transfer of the crawl to interested third parties costly and impractical. In addition to this, performing data processing operations on a dataset this large requires parallel processing techniques, and a potentially large computer cluster. Luckily for us, Amazon's EC2/S3 cloud computing infrastructure provides us with both a theoretically unlimited storage capacity coupled with localized access to an elastic compute cloud."
[i-programmer.info ]

Common Crawl [commoncrawl.org ]


Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved