DFSArch < CSLab

DFS Architecture

The DFS layer is a network of significant peers that possess file hierarchies (or namespaces), file data and raw storage, and share them according to the authoritative policy set by their owners.

Each peer may link foreign resources into its own resources. This aggregates distributed resources and makes them accessible along the hierarchy of a peer's own resources. This works like the links in Web pages that make foreign content accessible from a single place.

The identity of a peer is defined cryptographically with a public-private key pair. Peers can authenticate themselves and their communication. They can also protect their data and their communication with encryption. The owner of the identity may be a physical person, or a software entity, or a cooperating group of either. Owners set access policy to the peer's resources and policy for interaction with other peers. The DFS provides loose, eventual, consistency semantics. Conflicting concurrent accesses will not be resolved, but can be detected.

Peers own and completely control resources that are locally available to them. These resources are globally identified with URIs constructed from the identity of their owner peers together with a local identification. Local identification can be a file path or a random string. These URIs are the references used in peer-to-peer links, or any other resource designation, making the identity of the peers significant and the specific peer indispensable.

Returning to the web page analogy, owners can shape their (peers') resources as freely as one can edit their webpage. Owners can also link to foreign resources, but as in the web, they cannot control the resources they link to. This is an important consideration for the DFS architecture. This is particularly convenient for creating shared workspaces where everyone's resources are available for reading while owners only write their own.

There are two types of resources; filesystem and storage. Filesystem resources are hierarchical file namespaces including (logically) the content of files. Storage resources refer to raw disk storage. It is important to notice that the filesystem resource is higher-level content-based, whereas storage is a low-level consumable computing resource.

Access to resources is requested with Actions, which are well defined communication tokens. Actions always include identification for both the Authority, who owns the resource to be accessed, and the Agent, who requests the access, providing accountability throughout the network. Authorities have different communication endpoints for each type of resource, called Services; the destination Service for an Action can either be a Filesystem Service or Storage Service. A resource URI encodes the Cryptographic Identity of the Authority, the Authority Service and a Path or Handle for local identification (for Filesystem or Storage resources, respectively).

Actions can be filtered through a pluggable external filter, a Policy Enforcement Point (PEP), which communicates with its own policy service to provide authorisation. Users may associate arbitrary data with files, that can be included in the action tokens given to the PEP. The PEP can use the data as credentials.

Aggregation by cross-peer Filesystem links logically brings remote files in a local directory. Similarly, remote Storage resources can be logically joined together to a single, aggregated virtual image, a Storage Pool, which is managed by the VBS subsystem. Storage allocations for new or expanding files are made from such pools. Every filesystem service is associated with a primary storage pool. Storage can be offered to specific peers for use in their filesystems, by registering storage resources to the specific peer's primary storage pool.

Storage allocation is a distinct function from storage access. Allocation is performed by the VBS, while access is requested by the DFS Clients and is served, initially, also by the VBS. The architecture allows pluggable modules that implement allocation and access for different storage servers, such as HTTP, FTP, BitTorrent, or Grid-specific ones.

As Client peers navigate through the network, they may cache remote resources according to policy. Cached resources remain persistent for disconnected operation. Also, attempts for remote access can be configured to be persistently logged so that they can be retried after network recovery or application restart.

DFS peering mechanisms include primitives for publish-subscribe communication and remote event notification. The mechanisms are used to handle the significant asynchrony in the network, which is further amplified by disconnected operation. Through the DFS Client interface, applications can also access these primitives.

This topic: CSLab > DFSArch
Topic revision: r2 - 2008-03-07 - GeorgiosTsoukalas

No permission to view TWiki.WebBottomBar