Reliability of modular mesh-connected intelligent storage brick systems

IBM Journal of Research and Development, Mar-May 2006 by Fleiner, C, Garner, R B, Hafner, J L, Rao, K K, Et al

A key objective of the IBM Intelligent Bricks project is to create a highly reliable system from commodity components. We envision such systems to be architected for a service model called fail-in-place or deferred maintenance. By delaying service actions, possibly for the entire lifetime of the system, management of the system is simplified. This paper examines the hardware reliability and deferred maintenance of intelligent storage brick (ISB) systems assuming a mesh-connected collection of bricks in which each brick includes processing power, memory, networking, and storage. On the basis of Monte Carlo simulations, we quantify the fraction of bricks that become unusable by a distributed data redundancy scheme due to degrading internal bandwidth and loss of external host connectivity. We derive a system hardware reliability expression and predict the length of time ISB systems can operate without replacement of failed bricks. We also show via a Markov analysis the level of fault tolerance that is required by the data redundancy scheme to achieve a goal of less than two data loss events per exabyte-year due to multiple failures.

Introduction

The Intelligent Bricks project investigates storage systems based on a modular brick architecture with the objectives of simplifying system management, providing a large scaling range, and creating a reliable system from commodity components. Storage servers built with a single type of module, or brick, are attractive in terms of simplicity, scalability, and cost. Bricks include processing, memory, networking, and storage sufficient to run a distributed software system that delivers higher data reliability than that offered by the underlying hardware.

A key property of an intelligent storage brick (ISB) system is its fail-in-place or deferred-maintenance architecture: By over-provisioning or adding additional bricks while operating, hardware maintenance can be delayed for several years-possibly for the entire lifetime of the system. The distributed system software is responsible for automatically invoking spare disks or bricks as components fail. The only maintenance task users are expected to perform is to physically add bricks to meet growing capacity requirements.

This paper presents quantitative insights into the operating characteristics of mesh-connected ISB systems, in which bricks communicate only with physically adjacent bricks. We characterize such systems by the fraction of unusable bricks due to degrading internal bandwidth and external host connectivity, and a reliability expression that approximates the length of time that ISB systems can operate without replacement of failed bricks. Our goal is that ISB systems provide nearly 100% data availability, no ongoing hardware maintenance actions for several years, and a very low probability of data loss due to multiple failures. This paper is a companion to [1], which presents the overall ISB system and an operational 3 × 3 × 3-brick prototype.

Related work

The approach of distributing data across independent machines to build scalable storage systems has been explored in DataMesh [2], FAB [3], Self-* [4], Petal [5], and OceanStore [6]. Several companies are shipping products based on distributed data redundancy, including Panasas [7], Pivot3, LeftHand Networks, and Isilon. However, none of these approaches focus on fail-in-place or deferred maintenance. The Panasas system, while implementing a distributed RAIDS scheme, is oriented more toward delivering high performance. DataMesh [2] was a two-dimensional mesh-connected storage server that most closely resembled our ISB system and introduced concepts of distributed redundancy, fault isolation, and recovery. The more recent FAB project [3] proposed to build a brick storage system from commodity parts. The Self-* project [4] has a focus on simplifying administration by using a brick storage system, including mechanisms to schedule resources, classify files, and manage replicas.

An initial analysis of overprovisioning for capacity and bandwidth in an ISB system was first described by Kirkpatrick et al. [8]. They conservatively defined usable bricks as those that were connected to at least two or three other bricks. Our approach places data on bricks that may have only a single remaining connection to other bricks while avoiding possible set partitioning due to brick failures.

Brick usability in degrading cubes

A pristine cube (i.e., an initial cube with no failed bricks) contains N bricks arranged as a two-dimensional (2D) (h × h) or three-dimensional (3D) (h × h × h) nearest-neighbor network mesh. Each brick contributes storage, network bandwidth, memory, and processing resources. The bricks run system software that manages the storage data and implements a distributed RAID (dRAID) scheme, in which storage data is copied or encoded in multiple chunks and each placed on a distinct brick. As bricks progressively fail, a pristine cube slowly declines in performance and capacity. In this section we establish operating ranges of usable bricks in 2D and 3D mesh-connected ISB systems.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest