Active archiving - Storage

Computer Technology Review, Feb, 2002 by Christine Chudnow

High transaction databases grow as fast as summer weeds in overdrive. Database administrators (DBAs) have made strides in capacity planning, but it's hard to constantly configure and reconfigure database servers to handle their data loads. Meanwhile, performance suffers as databases take longer and longer to load, unload, search, reorganize, index, and optimize. This has a nasty impact on all sorts of database metrics, including response times, access to needed information, and service level agreements.

The simple answer is to move inactive data into storage, thus freeing up server space and improving performance time. But reality--as usual is a lot messier. The main issues are:

* How do you define "inactive?" It's not as simple as a flat time period. For example, some data from a drug discovery trial may be three years old, but if the FDA wants it now, the pharmaceutical had better produce it yesterday.

* Once you consult business rules and policies and identify inactive data, how do you move it without continuous manual intervention? It's not easy archiving data in the first place, especially when from a relational database such as DB2, Oracle, Sybase, SQL Server, or Informix. These databases are usually spread across a myriad of tables sharing multiple relationships. Further, records may be in use at any given time, meaning that unless the archiving routine has some way to freeze the database without users noticing, it's possible to corrupt the data with no easy way to restore it.

* Where do you move the data? If a database must occasionally query inactive data, that data must remain available to it. That leaves out off-site tape storage. Near-line storage such as tape libraries and cheaper servers may help, but that begs the question--how does the database know where the inactive data resides, and how can it access it immediately when it's needed?

Active Archiving

How do you tier storage and organize databases without sacrificing response times? Enter a relatively new term: active archiving. The concept itself dates back 30 years or more to early mainframes, where DBAs identified less frequently used data and moved it to less expensive storage. The effort was always time-consuming, and there have been many initiatives to automate the process. One especially famous approach--Hierarchical Storage Management (HSM)--rose and fell in the mid-90's due to poor implementation. But the concept refers to the ability to automatically move data between different levels of storage devices, depending on user-defined parameters. Active archiving, a subset of HSM, uses business policies to automatically identify less frequently used data. It then distinguishes between truly historical data and data that must remain immediately available, and moves the latter onto economical networked servers, such as a node with an attached disk farm.

For example, Princeton Softech offers advanced database tools that automate the data distinction process according to defined business rules. Princeton Softech Professional Services consults with the client to identify policies according to business rules and critical processes. Once customized with the resulting business policies, the application identifies data that is presently inactive but must remain available, and moves it to another networked server. Princeton Softech refers to this data as "active reference data." When queries are sent to the production database that may require the active reference data, the database transparently queries to this secondary location as well as the primary database files.

Jim Lee, Princeton Softech's VP of Product Marketing said, "It's a very simple concept, and the payback is multifold. But the technology is very complex." Lee sees three primary benefits of active archiving, including increasing performance by reducing size, reducing costs with less provisioning and manual intervention, and availability from faster backups and shortened maintenance time. Another important benefit is improved disaster recovery: If your data is 70% online and 30% reference data, you will save time by bringing up that 70% immediately. The 30%, which are active reference data, can come up later.

Sybase's Tom Traubitz, Senior Product Marketing Manager for Adaptive Server Enterprise (ASE), defined active archiving this way: "Let's take this to another level of abstraction--you are tiering performance of access, or directness of accessibility based on frequency of use. This idea of tiering storage has been around for a long time." What active archiving does offer is a level of innovation from computer scientists--automating the process of directing types of data into different holders by defining access patterns without human intervention. This work is based on patterns, not on predefined policies by user, application, or other parameters. "This is a fruitful area of research, and why it continues to advance." Traubitz sees this as active archiving's greatest contribution--reducing labor costs while managing storage based on temperature (classifying data by user patterns along a spectrum of hot to cold values). Sybase's Sethu Meenakshisundaram, Director of Server Development for ASE, agreed that the technology is complex. He pointed out that the main challenge for sensing and placing data based on usage patterns "is what's categorized as hot data or as cold data. It might change quickly in high transaction environments. So much data needs some kind of human intervention."

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale