(r2) DistributedFileSystemsOverview < CSLab

CSLab Web>DistributedFileSystemsOverview (revision 2)~~EditAttach~~

Hadoop Distributed File System (HDFS)

Developed and used as the primary storage system of the Hadoop

framework, HDFS is capable of handling large files of application data which are split into blocks, replicated and stored on different nodes of the cluster, thus achieving fault-tolerance and high availability. HDFS is designed to be deployed on commodity hardware, with emphasis on batch processing and high throughput of data access rather than low latency of data access. It follows a write-once-read-many access model for files, which simplifies data coherency issues. HDFS uses a master/slave architecture, where a single master node, NameNode, manages the file system namespace and regulates access to files by clients. Although this architectural design is simple and effective, it introduces a single point of failure in an HDFS cluster. HDFS is widely used in production sites (Yahoo!, Facebook, Last.fm, etc [1]).