Availability and Reliability

ENT, March 26, 2001

Every corporation or enterprise that relies on mission-critical data should have a data-availability strategy in place.

Enterprise-level data availability -- defined as the ability of a system to provide information upon demand whenever and wherever the user requires it -- requires more than simply creating a backup and recovery strategy. There are, after all, application scenarios where 100 percent availability may be required, or where a system failure cannot result in a loss of data. Reliability, on the other hand, means data must be correct, which means it must be the right data stored in a valid format.

For example, a large health care organization I once worked for classified its systems based on the criticality of the information. Clinical information systems, used by health care providers dealing with critically ill patients, had the highest data availability and reliability requirements. Availability, in this context, meant that the information had to be there when a physician, nurse, or allied health professional -- such as a laboratory technician or respiratory therapist -- needed it. The reliability requirement meant the data must apply to the patient in question, and it must be valid.

Other systems, such as billing and human resources, had less stringent availability standards, though reliability standards may be just as high. After all, if billing is inaccurate, the institution won't be reimbursed properly.

Once the goals of availability and reliability are agreed upon, what's the best way to achieve them?

First, specify the degree of availability required for specific data. In the hospital example, if the billing system is down, nobody is going to die. Availability of, say, 75 percent is sufficient. On the other hand, reliability requirements are high -- maybe 95 percent -- since billing errors directly affect the organization's bottom line.

Second, assess the system's vulnerability to various threats. Failure points include hardware, software, and human error. Hardware threats include power outages, disk-drive failures, network vulnerability, and internal system failures.

These types of vulnerabilities can be addressed by redundant or hardened hardware solutions. For example, if the systems are critical, they should be in a computer room with dedicated, filtered power, and either battery or generator backup. Similarly, disk-drive outages can be minimized using RAID technology. If you're really concerned about the physical environment of your systems, it might be worthwhile investigating the hosting of your systems at a dedicated, professional data center. In most cases, it's no longer cost-effective to build your own data center.

Software failures can include several types of operating system, application, and device driver errors. These kinds of failures are difficult to predict, but one good way to reduce the incidence of these faults is to introduce new software and upgrades in a staged process. Use dedicated system resources to stage the software, run it through its paces with regression and stress tests, and validate that specific functions work as required. Then, when you deploy the system do it in a systematic and documented way. If there are any problems you can quickly pinpoint what changes were made to the system when they were introduced. A careful, documented configuration management process can be invaluable to reducing or eliminating errors due to software incompatibilities. There are other solutions, such as clustering, backup/recovery, and journaling, that are available from the application solution provider or independent companies.

Human error is the toughest to stop, because humans are ingenious at coming up with new ways to mess with data. Certainly, using software algorithms to filter out invalid data with reasonability tests is one core approach. Defining database column constraints and triggers can also be invaluable in detecting and rejecting invalid data. It is also worthwhile to occasionally run the database integrity utility that comes with most relational databases.

Putting together a comprehensive data availability and reliability strategy can be difficult and complex, since there are so many ways for data to be corrupted. But if you focus on the data that is mission-critical, and come up with a process for ensuring the appropriate level of reliability, the rewards can be huge down the line.

--Robert Craig is vice president of marketing at Viador Inc. (Burlington, Mass.), and a former director at the Hurwitz Group Inc. Contact him at robert.craig@viador.com.

COPYRIGHT 2001 1105 Media, Inc.
COPYRIGHT 2008 Gale, Cengage Learning
 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale