CAS: storage for fixed content; getting the most out of all your information - Storage Networking - content addressed storage

Computer Technology Review, Oct, 2002 by Tom Heiser

A study conducted by the University of California at Berkeley, entitled "How Much Information" stated that the world will produce nearly 12 billion gigabytes of information this year, of which more than half is unchanging digital assets, otherwise known as "fixed content." Fixed Content is retained for active reference and value, it takes many forms such as critical business, legal, and reference documents; X-rays, email attachments, check images, broadcast content, satellite imagery, and much more. Unlike databases or files, which change or are constantly updated, the value of fixed content stems from the combined attributes of expanded use, authenticity and long life.

Once relegated to storage archives or file cabinets, fixed content is being driven online. This is fueled by internal needs, regulatory requirements, digitization across virtually all industries, and the desire to leverage this content into new services and revenue streams. However, the business value of fixed content is largely untapped today, because securely storing and managing large amounts of fixed content online for years or decades has been cost prohibitive, especially when organizations attempt to provide access to unlimited numbers of users at Web speeds.

Traditional disk storage systems with block or file access schemes are well suited for storage of tens to hundreds of terabytes of data typical of collaborative applications. But these systems lack the ability to easily and cost effectively scale and manage massive fixed content repositories that can reach hundreds of terabytes to petabytes in size. Balancing the logistics of data placement and capacity scaling with the need to authenticate data over the content's life, no matter how long that life may be (months, years or decades) also presents a challenge to traditional storage. To solve the quandary of managing and accessing large amounts of fixed content, a new category of networked storage has emerged: "content addressed storage" or CAS.

Pioneered by EMC Corporation, Centera is the industry's first implementation of CAS. CAS is optimized for managing, sharing, and protecting fixed content over its life, just as SAN has been optimized for block data and NAS has been optimized for files.

The content explosion is increasing storage capacity requirements in some industries by as much as 100% or more. Some industries where CAS will have the most immediate impact are:

Healthcare: CAS effectively eliminates the traditional barriers to widespread distribution and online availability of crucial digitized medical information such as X-rays, MRIs, and medical records. CAS enables management costs to remain flat as digitial X-rays and other large medical images accumulate, while ensuring long-term retention and authenticity of these digital images.

Financial Services: CAS addresses two major needs: 1) adherence to stringent regulations that require long-term content integrity, and 2) cost-effective online access to financial information with assured content integrity allows the information to be re-purposed to improve customer services and deliver new revenue producing services/products.

Film, Broadcast, and Media: Video, film, and audio content are the media and entertainment industry's key assets. But only if their reuse, and sale can be managed and protected. A CAS system is an excellent digital asset repository solution because it addresses simultaneously the issues of long-term retention, protected ease of use and verified content authenticity.

Behind the Technology

EMC's Centera is an integrated software and hardware solution purpose built to deal with the storage needs of fixed content. The vast majority of customer value comes from Centera's software. It dramatically improves the ease of use and management of fixed content.

When an object is stored, Centera calculates a 128-bit claim check from the object's binary representation. Centera then translates the 128-bit result into a unique 27-character identifier, called the content address. The content address is derived from, and is unique to, that individual piece of content. Content addressing distinguishes Centera from other storage technologies (all of which are based on location addressing) because it eliminates the need to understand and manage the physical or logical location of information on the storage medium.

Centera links the fixed content object to the application and user via an intermediate data structure, called a c-clip descriptor file (CDF) that contains: time-stamp information, any application-specified meta-data, and the content address for the stored object. It is the CDF's Content Address, not the content object's content address, that the application holds as the virtual "claim check."

The advantages of content addressing:

Assured content authenticity: A content object can have only one content address. Any change to content is detected because it results in a different content address.

A globally unique, location-independent identifier: Using a content address to address the content results in a location independent reference to the content. The content address is independent of operating systems, file systems, and applications. For example, a user's application can store an object in Centera using a document management application running on Windows NT, and retrieve it through a legal application running on Sun Solaris.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale