Rapid restores from data disasters

Computer Technology Review, Feb, 2004 by Ganapathy Krishnan

As an IT administrator, it is with a sense of utter panic you realize that the machine with critical data is unusable for a variety of reasons. What's more, you realize backing up your data was the easy part--restoring it under the gun is time consuming and tedious. In many instances, it can take a whole day to recover your data partially, and restore your server.

This article will examine the most frequent cause of data disasters like hardware failure, application corruption, and site disasters. It will also describe an architecture that is equally effective for protecting transactional applications--such as databases, mail, and files--using Zetta Server, a 64-bit operating system that can be installed on standard Intel servers.

Appliances based on Zetta Server provide advanced data protection, high-availability, and rapid restores for Windows, Unix, and MAC OSX platforms--all managed from a single unified Web interface. The secret sauce is a combination of redundant hardware, unified file, block technology, sophisticated application aware (e.g., Exchange, SQL Server) snapshot technology, as well as advanced replication that allows you to replicate both file and block data to a remote facility.

[GRAPHIC OMITTED]

Data Disasters

There are four major causes of data disasters:

* Hardware failure

* Admin/user error

* Application corruption

* Site disasters

Hardware Failure: Hardware failures can be mitigated by using redundant power supplies, RAID, mirrored memory, and an Active/Passive failover system.

Admin/User Error: Administrator error can arise in different ways. First, the traditional backup process is largely untested for restores. Administrator errors often show up only in the event of emergencies. There is no easy way to verify that backups have been successfully executed.

Application Corruption: Applications such as Microsoft Exchange. SQL Server, and Oracle sometimes corrupt data. Almost every system or database administrator with any level of experience has spent hours wrestling with a corrupt database or a corrupt information store. This is difficult to deal with because backups themselves may be corrupt and there is virtually no way to deal with it other than to extract backup after backup from tape until the application recognizes a consistent backup. Application corruption can also cause significant data loss.

Site Disasters: Site disasters can be prompted by earthquakes, fires, floods, terrorism and sometimes as simply as the failure of air conditioning in the server room. A site disaster is often disastrous and less than 40% of companies recover successfully from one. Even companies that use offsite backup have to deal with data loss, because backups are typically over a week old. Recreating servers from backups is a monumental task, especially if it involves databases.

Data Protection: It's All About Rapid Restores

Traditionally, backup has been one of the chores relegated to the junior system administrator. Most companies religiously back up to tape or tape libraries--a full backup once a week and incremental backups every day. In the event of a disaster (especially with data continuously generated by applications such as Exchange, SQL Server, and Oracle) current backup techniques are woefully inadequate for the following reasons.

First, a backup restored from tape is likely to be at least 24 hours old. Second, it is quite likely that the most recent backup is not usable. Third, it is not possible to determine whether Exchange data or databases are corrupt until the backup is restored and working correctly with the application. Most often, the application server is likely down for the better part of a day, while back-up administrators are figuring out which backup to use. After going through this painful process, in many cases, the backup that is successfully restored is often a week old.

Disk-Based Backup

There are a number of companies touting disk-based backup as the panacea for these problems. Disk-based backups are great for decreasing the time to back up vast amounts of data, but have the same problems associated with backing up to tape, because it is very difficult to back up application-generated data continuously to disk.

Point-in-Time Copy or Snapshot Technology

Snapshot technologies allow companies to create point-in-time copies or online backups as the system is running. In the event of data corruption or data loss, users can roll the state back to the last snapshot or the last uncorrupted snapshot and restore. According to Randy Kerns, an analyst with the Evaluator Group, "With snapshots, you do not have to take your application out of service."

Most users are used to traditional backup methodologies: Level 0 backups every week and Level 1 backups every day. Restoring from these backups involves first loading a Level 0 backup and then a Level 1 backup. Unlike traditional backups, snapshots have the following important features:

* Each snapshot is complete in itself. It is not necessary to load the original snapshot, and then add incremental snapshots to create a final image.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale