The challenges of data management in biotech: booming life sciences research has IT resources bursting at the seams - Storage Networking - Industry Overview

Computer Technology Review, July, 2002 by Christine Taylor Chudnow

Pharmaceutical companies typically spend over $200 million and around 12 years to hypothesize, develop, and seek FDA approval for a new drug. Pharmaceuticals and other life sciences companies are desperately seeking faster and cheaper ways to invent more unique drugs and are hoping to capitalize on the tremendous advances in genomic and proteomic research. To aid in their research, life sciences companies have made tremendous use of biotech tools: computing hardware and software developed specifically for the life sciences sector.

Life sciences researchers depend heavily on these biotech key areas: In silico biology, which are computational tools that translate raw data into workable models or simulations, guiding target selection and drug development. Bio-informatics and genome research, also computational tools but for processing the overwhelming amount of data gathered through genome research, such as the Human Genome Project. (Human genome research is not the only game in town. The other hottest genomic research subjects are mice, worms, yeast, and, believe it or not, puffer fish.) Proteomics is high-throughput expression analysis and characterization, impacting diagnostic and therapeutic product development.

During the tremendously challenging drug-discovery process, life sciences researchers and manufacturers commonly use dozens of specialized technologies to identify commercially important genes, discover their functions, validate drug and product development targets, and identify and develop clinical candidates. For example, Millennium Pharmaceuticals uses six different types of technologies during its drug-discovery procedures:

* Biosensors, which are highly sensitive microchips that use tiny samples to analyze interactions between proteins and proteins and small molecules.

* High-throughput DNA sequencing software, which automates the process of capturing, storing, and analyzing data about the DNA sequence.

* High-throughput screening, which draws from a library of validated targets to test multiple compounds against each drug target, discovering and reporting appropriate responses.

* Imaging, which displays data while retaining correct spatial relationships between individual data points.

* Informatics, which are sophisticated computational tools that access and interpret public and proprietary databases on a wide range of scientific information.

* Microfluidics, which tests genomes and drugs in minuscule volumes measured in nanoliters.

The Search for Power

A common search function underscores the industry's need for processing power. Scientists often search across genomic and proteomic databases to compare sequences, which are letter strings that represent genes and proteins. Four-letter strings represent genes, with each letter standing for a different nucleic acid (nucleotide). Twenty-letter strings represent proteins, each letter standing for a different amino acid. Since these sequences can match on many different aspects of letter strings, sequencing requires high-powered, specialized search programs such as BLAST (Basic Local Alignment Search Tool) that can quickly range across multiple-terabyte databases.

To increase processing power to these kinds of searches, as well as to bioinformatics operations and in silico experiments, biotech has adopted supercomputing, clustering, and grid computing.

Supercomputers: The Bioinformatics Center of the Institute for Chemical Research (ICR) at Japan's Kyoto University uses supercomputers as the network servers for its KEGG (Kyoto Encyclopedia of Genes and Genomes), a major global genome database system. Three Sun Microsystems Sun Fire 15K computers work at the system's core, each one containing 72 CPUs, 144GB of memory, and 15TB of storage. The Sun Fire offers SMP (symmetrical multi-processing) architecture, foundational to many life sciences applications.

As part of its Blue Gene life sciences compute project, IBM has partnered with the Department of Energy's (DOE) National Nuclear Security Agency. Working with Lawrence Livermore National Laboratory, IBM is jointly designing a new Blue Gene supercomputer called Blue Gene/L. It is slated to operate at about 200 teraflops (200 trillion operations per second). (IBM and Lawrence Livermore co-developed the world's current record-breaking supercomputer, the "ASCI White" machine now in operation at the lab.) Compaq is racing IBM to see if it can better its record in the life sciences and is collaborating with genetic database company Celera Genomics and the DOE's Sandia National Labs. The Compaq supercomputer should be able to do at least 100 teraflops per second, which is about eight times faster than the existing ASCI White but considerably slower than Blue Gene/L. Both have target dates in 2004.

Clusters: Many life sciences companies can't afford to keep a Sun Fire in the basement and, therefore, turn to clusters for running parallel compute operations. A cluster is a group of tightly integrated servers that acts as a single computer. Although clusters running proprietary clustering software can be costly, a popular PC-based clustering approach is called a Beowulf Cluster. Beowulf clusters combine open-source software and applications (usually Linux), economical PC servers, and a high-speed backbone. This configuration can yield affordable, virtual supercomputing for applications whose data or tasks can be processed in parallel. The NIH, for example, has invested heavily in 176-node "Biowulf" (proving that life sciences researchers actually do have a sense of humor).


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale