Hadoop Distributed File System (HDFS)

Developed and used as the primary storage system of the Hadoop framework, HDFS is capable of handling large files of application data which are split into blocks, replicated and stored on different nodes of the cluster, thus achieving fault-tolerance and high availability. HDFS is designed to be deployed on commodity hardware, with emphasis on batch processing and high throughput of data access rather than low latency of data access. It follows a write-once-read-many access model for files, which simplifies data coherency issues. HDFS uses a master/slave architecture, where a single master node, NameNode, manages the file system namespace and regulates access to files by clients. Although this architectural design is simple and effective, it introduces a single point of failure in an HDFS cluster. HDFS is widely used in production sites (Yahoo!, Facebook, Last.fm, etc [1]).

CEPH

A promising distributed file system designed to provide robust, open-source distributed storage with excellent performance. Ceph provides seamless scalability, it expands volumes by simply adding disks. By decoupling data and metadata operations and using a highly adaptive distributed metadata cluster architecture, Ceph is able to dynamically redistribute the file system hierarchy among available metadata servers in order to balance load and most effectively use server resources. All data in Ceph is replicated across multiple Object Storage Devices and spread out among a large number of disks to ensure strong reliability and fast recovery. The Ceph client was included in Linux kernel v2.6.34. [2]

References

  1. Applications and organizations using Hadoop
  2. Ceph Petabyte Scale Storage

-- IoannisKonstantinou - 10 Jun 2010 -- ChristinaBoumpouka - 16 Jun 2010


This topic: CSLab > WebHome > DistributedFileSystemsOverview
Topic revision: r3 - 2010-06-16 - ChristinaBoumpouka
 
This site is powered by the TWiki collaboration platform Powered by Perl

No permission to view TWiki.WebBottomBar