Forum Moderators: Robert Charlton & goodroi
[baselinemag.com...]
For example, the GFS ensures that for every file, at least three copies are stored on different computers in a given server cluster. That means if a computer program tries to read a file from one of those computers, and it fails to respond within a few milliseconds, at least two others will be able to fulfill the request. Such redundancy is important because Google's search system regularly experiences "application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking and power supplies," according to the paper.
Among many interesting bits, this stood out (emphasis mine):
Google's systems seem to work well for Google. But if you could run your own systems on the Google File System, would you want to? Or is this an architecture only a search engine could love?Distributed file systems have been around since the 1980s, when the Sun Microsystems Network File System and the Andrew File System, developed at Carnegie Mellon University, first appeared. Software engineer and blogger Jeff Darcy says the system has a lot in common with the HighRoad system he worked on at EMC. However, he notes that Google's decision to "relax" conventional requirements for file system data consistency in the pursuit of performance makes its system "unsuitable for a wide variety of potential applications." And because it doesn't support operating system integration, it's really more of a storage utility than a file system per se, he says. To a large extent, Google's design strikes him as more of a synthesis of many prior efforts in distributed storage.
So Google has an amazing, exotic, distributed (and proprietary) file system, which really only works well for Google's applications it seems ... and in part due to the decision to opt for performance over consistency?!
That has rammifications ...