Understanding Online Archiving - Technology Information

Computer Technology Review, Jan, 2000 by Paul Wang

Online archiving provides more efficient and faster access, plus major disk space savings without performance penalties

System administrators have historically relied on offline archiving for data backup and storage. In a typical scenario, offline archiving is a manual process for moving data to a media that is no longer connected to the system environment. When it becomes necessary to retrieve that particular data, then another similar manual process must be performed to bring the data back onto the system environment so that the data can be used.

Other drawbacks exist besides the intensive time required in the manual archiving method such as it impacts user productivity while the system administrators wait for an operator to locate the right tape and load it. Locating the desired files can also be a challenge with potentially hundreds or even thousands of on-site and off-site tapes. Once the files are found, the data on the tapes may be corrupted due to an indefinite shelf life of tape medium caused by oxidation. Finally, it increases management and helpdesk costs because it is a very manpower intensive operation.

The other traditional data storage, nearline archiving, involves moving the data to a slower media such as robotic tape and laser or magnetic optical jukeboxes. Nearline archiving is also referred to as Hierarchical Storage Management (HSM). Retrieving data from nearline archiving devices is slow, but is much faster than doing it from offline archiving, since it is not a manual process.

A HSM system selects files through a policy procedure and archives them. The archiving is a multi-step process, including data compression and then moving the files to the nearline storage device. Additionally, when a user or application attempts to access an archived file, a time lag occurs. The HSM will find the device and media where the file is located, and then inform the device to load the appropriate media. Once the file media is loaded, the HSM will retrieve the file from the media and decompress it, at which time the file will be available.

Issues the system administrator faces in nearline archiving include the configuration requirements for optimum storage: archiving of the least-needed data. Additionally, the HSM system must operate as desired without adversely affecting performance on a regular basis.

For example, let's say an HSM system is configured and files are migrated to nearline devices. A "performance hit" or lag time is required to access a particular file and bring it back to the online system. If the HSM system is not properly configured, one of two situations can occur. First, the system administrator is not archiving enough data because he or she is not sure whether it will be needed or whether the performance lag time is acceptable or, second, too much data is archived and each time the file is accessed, lag time results.

A case in point is an application that requires a nearline-archived file every three months. On each occasion, this file is retrieved from a tape robotics system, brought back into the system, and lag time is incurred. Here's how this scenario plays out. In 60 days, this particular file is moved off the system and, 30 days later, it is moved back on the system. As a result of this highly unproductive movement, most system administrators generally opt for the first extreme of not archiving enough data due to the lag time issue.

Then, there is the cost of nearline archiving because it is a highly complex system. Both the hardware and the software are expensive. However, the highest cost incurred with nearline archiving, or HSM, is management. HSM is complex to configure and to manage well. Without archiving data, system administrators will definitely run out of disk space. Each time this occurs, the system is brought down and new hardware is installed. Then, it is configured and the data is reloaded. The downtime and management are very expensive. (This scenario assumes that the hardware was already purchased and delivered. If not, the cost of managing this system skyrockets.) Also, the more pieces of hardware, the greater the opportunity for failure. The Table shows that on average the disk drive Mean-Time-Between-Failure (MTBF) is five years for one disk. With 60 disks, MTBF is one month, and, with 180 disks, it is 10 days.

Online Archiving

Online archiving for Unix and Windows NT environments is now making its entrance to resolve these storage and backup issues that are plaguing system administrators. Online archiving refers to taking data not being used on a regular basis and storing it efficiently on direct access systems--disk drives or enterprise storage systems connected via SCSI, fiber, or other cabling. Additional hardware is not required in an online archiving environment, but more importantly, in addition to efficient data storage, the hallmark of online archiving is high-speed access when the data is needed. Key benefits to the system administrator are reduced backup time and reduced hard-drive requirements, which in turn, translates into reduced management, maintenance, and support expenditures.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale