Brought to you by Adobe
- Adobe® Acrobat® 9 Pro Extended - a complete PDF solution
- Create interactive presentations
- Bring people & ideas together
- Communicate with impact
Featured White Papers
- 5 Strategies for Making Sales the Engine for Growth (AchieveGlobal)
- Hosted CRM buyer's guide (Inside CRM)
- Hosted CRM comparison guide (Inside CRM)
Technology Industry
Industry: Email Alert RSS FeedLong Term Data Preservation - Industry Trend or Event
Computer Technology Review, Oct, 1999 by Fred Moore
The past 40 years have seen the information revolution accelerate and the new millennium clearly promises to begin labeled the Information Age. Estimates now indicate that half to two-thirds of the world's data is being "born digital," meaning that its original occurrence was in a digital format. By the year 2004, it is projected that as much as 14% of the known data in the world will be captured in machine-readable digital format. Nearly 86% of the world's data will remain on paper, microfiche, various films, or other non-machine-readable formats.
The digital data storage technologies map into a hierarchy consisting of fixed and removable media products that are making mass storage, data archiving, and electronic data vaulting affordable realities. Automated libraries using magnetic tape, possibly small form-factor magnetic disks, the DVD, and possibly other emerging storage mediums will become the foundation containing most of this mass-storage growth. New and emerging digital applications will continue to fuel a period of explosive growth for storage well into the next century as terabyte-plus databases for a variety of new applications, data warehouses, electronic voice, and video mail systems all drive up requirements. The storage demand created by the public Internet has yet to be determined and will generate countless new application and e-business developments along with many storage management challenges.
Today's assessment of high-capacity data storage systems identifies data preservation as a looming crisis of dramatic proportion. Centered on data collection, storage, retrieval, and its transmission, we are faced with trying to understand how to preserve and archive this data on a long-term basis. Of the world's digital data, approximately 90% reside on mass storage or removable media technology such as magnetic tape and optical disks. The other 10% reside at the higher end of the storage hierarchy on fixed magnetic disk subsystems and solid-state memory devices. The rate of change and subsequent obsolescence of high-capacity storage systems is now a strategic area of focus for suppliers and customers alike. Storage media typically will last longer than the storage hardware components and potentially leave large volumes of data in legacy formats when hardware upgrades occur.
A report from the National Media Laboratory in St. Paul, MN published in 1998 indicated that magnetic tape formats would last between ten and twenty years if kept within environmental guidelines for temperature and humidity. The report also noted that human readable media could remain readable for much longer periods of time than can computer-based storage media. Newspapers can be read clearly from ten to thirty years based on environmental conditions and microfilm can enjoy a usable life of up to one hundred years or more. The industry's optimal data storage offerings readily provide for backward compatibility in reading older formats on the newer storage devices. At some point however, the data will have to migrate to a newer technology, as software cannot continue to support older formats forever. What is the value of having media that is readable twenty-five years or more from now when no software, replacement parts, diagnostics, or maintenance services for those storage devices will exist that understand the data format or recognize the media type?
With digital content increasing on the order of 60% annually or more across all computing platforms, successful long-term data survivability relies on scalable storage architectures with high bandwidth parallel data paths for data migration to occur in a reasonable time. A 20MB/sec SCSI channel moving data at its maximum rated speed could move or, in this case, migrate 72GB of data per hour.A terabyte of data would take 13 hours and fifty-three minutes to migrate (at rated speeds) to a newer technology. It would take a week to move just over twelve TBs at 20MB/sec. Using multiple data paths in parallel can bring a server and its corresponding application to a halt for a considerable amount of time. Is this acceptable? Wouldn't a SAN be an ideal solution for this ongoing long-term storage activity? The need for both high bandwidth and parallel transfer capability for storage subsystems becomes obvious, particularly with the amount of digital content growing at 60% or more annually.
Some scientific systems are now acquiring as much as five terabytes of digital data per day. By comparison, data transfer speeds have only increased on the average of about 1520% annually over the past ten years and progress has been mainly boosted by the more recent jump from 20MB/sec SCSI to 100MB/sec Fibre Channel. Storage capacities will continue to grow much faster than corresponding transfer capability and the need for an effective data migration strategy is quickly becoming more widespread. It is clear that a "capacity is everything focus" without understanding performance and throughput capability is not a strategic view.
