Archival data has a new mission: Critical; it's not what it used to be

Computer Technology Review, Feb, 2003 by Fred Moore

Having chaired two panels at storage conferences in January, it was clear in both that the category of archival data is quickly demanding a new wave of focus. For years, archival data was used to describe data that was in the long-term, decreasing value stage. As customers and vendors now identify their more pressing storage needs going forward, the amount of data in the category of "long-term retention" is being viewed differently than in the past. Historically, when data reached archival status, it had reached its final state before being deleted ending the data lifecycle. Archiving almost always assumed that the value of data decreased as it aged. This is no longer the case. Recently, data lifecycle management has taken on renewed emphasis and at times it seems like all data is critical. The second-wave of archival storage management is underway.

Many organizations are now facing increasing regulatory pressure to comply with federal mandates for email, medical, insurance, legal, financial and government classified data. In addition, over half of the digital data being generated annually today (this is approximately one exabyte or 1x[10.sup.18]) now falls into the category of "fixed content," meaning that the data doesn't change after it is initially created. Fixed content is sometimes referred to as "reference data" "rich media," or archival data. Fixed content includes storage intensive applications such as critical business applications data, complex legal and reference documents, medical data, email attachments, blueprints, satellite imagery, security surveillance, check images, and broadcast content, among others which content is seldom if ever altered.

The assumption that older or aged data has lost its value no longer holds for several specific vertical markets. New applications and a variety of legal and business requirements are driving the need for many businesses to re-examine their archival policies. One of the most visible examples of the emphasis on the increasingly critical value of archival data lies with the HIPAA (Health Insurance Portability and Accounting Act) requirements. Not only does HIPAA require health providers to preserve data for a yet to be determined time period, but the failure to protect critical patient data presently carries with it penalties ranging up to $25,000 per violation. Just the threat of the fines and other forms of noncompliance are encouraging storage administrators to make sure that an increasing number of archival data applications will be kept indefinitely for future reference. The PACS application (Picture Archiving and Communications System) that captures and stores radiology information and other types of medic al images is a primary component of the HIPAA requirement. Email archives also fall into this category and face increasing pressure to be retained indefinitely for legal reasons. As a general rule used for common email retention policies, 80 percent of email can be immediately archived. Email will soon require HSM on steroids to meet the archival demands! Given today's legal, economic and political climate, the value of archival data has never been higher.

The increased emphasis for preserving critical archive data requires a different set of storage attributes than did previous archival management schemes.

Archival Storage and Data Characteristics

* Large-scale storage capacity needed, scalable to petabytes (1x[10.sup.15])

* Infinite data retention periods required (measured in years) as the data must be preserved, but not necessarily the media it resides on

* Archive data normally has low access and reference requirements but relatively high data transfer rate (bandwidth) requirements

* Much of archival data is static in nature or "fixed content," unstructured and is stored using a variety of formats

* WORM (Write-Once-Read-Many) capability is increasingly desirable for legal reasons

* Random and sequential access required based on the application

* Delayed initial access time is acceptable (from seconds up to a few minutes)

* Archive data can involve local and remote access (location independent) with many users in many locations

* Needs a data classification taxonomy to enable unique content search and access as some archival searches can cost six figures

* Multiple copies of archival data are needed given the criticality and increasing value of data

* Device security and data security (intrusion protection, authenticity) are required for archival data management

* Archive data requires its own policies consistent with regulatory practices for each industry category

The data lifecycle is traditionally described as having four distinct categories. In each case, we continue to observe that the probability of reuse of data decreases as data ages. In the past, the value of data most often decreased as data aged.

1) The active cycle -- this period often lasts for 30 days, typically disk storage (P=>.5)

2) The reference cycle -- this typically lasts for 60 days, typically disk and automated tape storage (P=>.1)

3) The archive cycle -- this period often lasts up to seven years, typically automated tape though the new class of archival disks are gaining momentum such as the 160Gb and 320Gb ATA disks for fixed content storage (P=<.01)

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale