Technology Industry
Industry: Email Alert RSS FeedSmart object-based storage cluster computing - Storage Networking
Computer Technology Review, Oct, 2003 by Rod Schrock
The advent of Linux compute clusters has forever changed the high-performance computing landscape. Instead of using proprietary, expensive supercomputers to solve the most challenging computing problems, nearly every new supercomputing system installed today is comprised of thousands of low-cost Linux servers united into a cluster. To unlock the full potential of these Linux compute clusters, a complementary data storage solution is needed. There is just such a solution: object-based storage clustering. Object-based storage clustering systems have the intrinsic ability to linearly scale in capacity and performance to meet the demands of the most powerful Linux-based clusters.
Most RecentTechnology Articles
- The Google Manifesto: Dr. Open and Mr. Closed
- RIM Is Getting Too Successful for Its Customers' Good
- Tech Law: Google Loses in France, GPL Suits Target Many, IBM Sued, More
- Microsoft Moves Fast, Already Has Custom XML Patch for Word
- Microsoft Might Get Advantage or Pain from Order To Not Sell Word
- More »
There are many technical and commercial applications that are benefiting from the "scale out" Linux clustering wave. For example, geophysicists are developing more capable seismic analysis techniques to image the earth's substructure and guide oilfield drilling, resulting in a 25% higher predictability of energy discovery.
Pharmaceutical companies mine massive genomic datasets to provide better insight into human diseases and develop more personalized therapies. Commercial aircraft and automobile designers develop extensive computer simulations to make transportation faster, safer, and more comfortable. Internet portals such as Google index the content from hundreds of millions of Web pages that comprise the Internet.
All of these applications are recognized for their computational complexity. However, often underestimated is the equally ravenous appetite they have for high-performance data access. Without rapid and efficient access to data, scarce computing resources sit idle. Unfortunately, traditional networked storage systems are simply incapable of providing the data throughput needed to keep ever growing Linux clusters operating efficiently.
Equally important, these massive datasets need to be made globally available to all processes executing across the compute cluster to simplify application development and to ease the burden of managing data repositories. Here again, traditional networked storage systems fall short: they are incapable of scaling capacity within a single namespace and thereby increase the time and complexity of managing networked data.
Data Access Patterns
To understand the need for a new approach to scalable storage, we first explore the manner in which many cluster computing applications address the storage bottleneck. Linux cluster applications use a scale-out approach to parallel computing. In this model, applications employ a 'divide-and-conquer' approach, decomposing the problem to be solved into thousands of independently executed tasks. The most common decomposition approach exploits a problem's inherent data parallelism--breaking the problem into pieces by identifying the data partitions that comprise the individual task, then distributing each task and corresponding partition to the compute nodes for processing.
For example, animation-rendering applications distribute scene generation tasks to hundreds of cluster compute nodes--each generating an individual frame of the final segment. Shared scene and character information and per-frame rendering instructions must be distributed to each of the compute nodes, and each node generates as much as 50 MB of output per frame. The individual frames are then sequenced and assembled into their final form for review. This is a common data access scenario across many cluster-computing applications.
The natural inclination of cluster computing developers is to deploy a networked storage solution that can be accessed by all nodes in the cluster. Such a solution greatly simplifies management of the compute jobs as all data partitions and replicas can be made available to all nodes, and hence any of the tasks can be computed on any node. Additionally, the output of these jobs can then be used directly elsewhere--in post-processing, visualization or even as the input to the next processing task in a computational pipeline.
Unfortunately, standard shared-storage solutions provided by NFS file servers are only sufficient for small clusters of 10 to 20 nodes. Larger clusters require more scalable storage solutions. Storage Area Networks (SANs) and Network Attached Storage (NAS) architectures have been employed for modest-sized clusters of approximately 20 to 50 nodes. However, these architectures have severe limitations as clusters become larger.
Neither SAN nor NAS architectures support the aggressive concurrency and per-client throughput requirements of scalable cluster computing applications. SANs were designed to provide a modest number of application servers with high-performance, highly reliable access to a shared pool of storage devices (e.g., for enterprise transactional databases). SANs improve the storage provisioning process, allowing disks to be moved among application servers to address changes in capacity requirements; but this leads to application server-based islands of data. NAS systems, on the other hand, were designed to afford widespread data sharing on heterogeneous platforms with relatively modest per-user I/O requirements (e.g., for user home directory storage).
CXO UnpluggedSmart Business interviews on BNET
Brought to you by CBS MoneyWatch.com
- Best- and Worst-Paid College Degrees
- 6 Things You Should Never Do on Twitter or Facebook
- How Much Sleep Do You Really Need?
- 6 Big Myths about Gas Mileage
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Technology Articles
Most Recent Technology Publications
Most Popular Technology Articles
- BizRate to monitor in-store customer satisfaction for Office Depot stores - Market Intelligence
- Speed control of separately excited DC motor
- Effects of creative, educational drama activities on developing oral skills in primary school children
- Political stability and economic growth in Asia
- Failed businesses in Japan: a study of how different companies have failed, and tips on how to succeed, in the Japanese market



