Shared Data Clusters: Achieving Application Scalability And Availability With A SAN - Technology Information

Computer Technology Review, Jan, 2001 by Paul Massiglia, Anne Janzer

Paul Massiglia

Anne Janzer

Examining the requirements for implementation

This article is the first in a two-part series. The second part will appear in the February issue of CTR.

The emergence of Storage Area Networks (SANs) is enabling new system configurations that can leverage shared storage. One common system configuration is the "shared nothing" or availability cluster in which multiple computers share common storage and client access and can "take over" for a computer (node) in the cluster that fails. Shared nothing clusters can be implemented on Windows or Unix systems using products like VERITAS Cluster Server or Microsoft Cluster Server.

The servers in these clusters do not access the same data concurrently. Instead, one server owns and accesses data in the common storage. That capability can be passed to another server if the first should fail.

A shared data cluster takes a different approach, allowing nodes in a cluster to access the same data at the same time. Although sharing data introduces challenges of maintaining data integrity, it also leverages the capabilities of the SAN to consolidate data, improve availability, and support scalable solutions.

This article discusses the requirements for implementing shared data clusters and the challenges and benefits of this approach. It also discusses the VERITAS SANPoint Foundation Suite HA to illustrate how these challenges can be addressed with a packaged software solution.

The second part of this article (to appear in the February issue of CTR) will describe sample applications for shared data clusters, focusing on those kinds of applications that can derive the greatest benefit from this configuration.

These articles concentrate on applications that use file system data; they do not discuss the separate but important issue of clustered databases. Databases, whether residing in file systems or on raw partitions, require clustered database managers such as Oracle Parallel Server to provide the highly granular locking and integrity capabilities necessary in a multi-instance environment. This subject is beyond the scope of this discussion.

Background: Clusters And Shared Storage

A cluster is generally defined as a grouping of computer systems that cooperate to work as a single entity in some capacity from a client's perspective. These systems must communicate with each other and share access to common storage.

The essential components of a cluster include (Fig 1):

* Multiple, independent servers

* Common client access

* Commonly-accessible storage

* Software that manages the "clustering" behavior

Although they have been around for quite a while in proprietary or high end applications, clusters are an increasingly popular system design today for a number of reasons.

Availability is a key concern not just for a few specialized applications, but also for business applications ranging from financial institutions to online storefronts to departmental file servers. A cluster enhances application availability because it provides redundant servers accessing common storage; if one server fails, another can pick up its processes.

Improvements in processors and systems are making it possible to create clusters of desktop or midrange systems that can serve very demanding environments.

SANs, which may be adopted for a variety of reasons, provide readily-available shared storage for cluster configurations. Clusters do not actually require SANs--you can build a cluster using a simple switched SCSI device. But the SAN makes it possible to share much larger storage pools between more servers. And clusters can help organizations leverage the benefits of the SAN implementation.

Shared Storage vs. Shared Data

Although all clusters share access to common storage, not all clusters actually share the data itself.

The most common cluster architecture today is a "shared nothing" cluster, which means that the systems in the cluster do not share memory or concurrent access to data on the common storage. Although data resides on common storage (the SAN), only one system "owns" and accesses the data at any time. Another system (or node) on the cluster only accesses that data if it is taking over the tasks of the original node (Fig 2).

A shared nothing cluster is useful for improving application availability. Any application can run in this configuration transparently. If there is a failure, another node in the cluster can start the application and serve new requests.

Many web farms adopt this basic architecture. Each node (web server) accesses its own read-only copy of a web site. Load balancing applications (such as Cisco Local Director) direct clients to web servers. Scaling up is as easy as adding another server with its own copy of the site.

A shared data cluster actually lets multiple nodes in the cluster access shared data concurrently. The basic hardware configuration is unchanged. The difference lies in the software managing the shared data--the clustering software as well as the clustered volume and file system software (Fig 3).


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale