Health Publications
Topic: RSS FeedAnnotation and cross-indexing of array elements on multiple platforms - Genomics and Risk Assessment: Mini-Monograph
Environmental Health Perspectives, March 15, 2004 by William B. Mattes
On the surface, transcript profiling using microarrays seems to offer a way of looking at the global response of the cell to perturbation, with a focus on changes in gene expression. The difficulty, however. is that the response of a particular gene is actually measured on the array by an element that is a short, defined nucleic acid sequence. Sequences that map back to the same genetic locus may actually be given different names and descriptions when they are deposited in public sequence databases; when such sequences are used in microarray construction, dements that monitor the same genetic locus may have different names and descriptions. The algorithm described here uses a hierarchical approach to assign a single best annotation to the dements in a given microarray in such a fashion that dements from one microarray platform may be cross-indexed with those of another. The algorithm relies on the nucleic acid accession number for a given array element, and uses that to retrieve annotation from the most recent versions of LocusLink and UniGene. Both database resources are searched, with a priority being given to annotation derived from the curated LocusLink database. In lieu of annotation found in these databases, the default GenBank annotation is used. As a final outcome, a cross-chip identifier is generated that may be used to cross-index array dements. The program is available as a practical extraction and report language (Perl) script that can run under any Perl interpreter. Key words: annotation, cross-platform, indexing, LocusLink, microarray, UniGene. Environ Health Perspect 112:506-510 (2004). doi:10.1289/txg.6698 available via http://dx.doi.org/ [Online 15 January 2004]
**********
On the surface, microarrays and other genomic technologies offer the toxicologist a look at the transcript levels for hundreds to thousands of genes. However, although toxicologists and cell biologists think in terms of genes and pathways, these technologies actually measure nucleic acid sequences. Thus, the challenge is to clearly associate a given nucleic acid sequence with the most current and consistent information on the gene of which it is part. This association is complicated by the fact that the same sequence can be submitted to public databases from several sources that may assign it different names and descriptions. For example, the gene, N-myc downstream regulated (Ndrg1) (LocusID 10397; http://www.ncbi.nih.gov/ LocusLink/) was originally cloned and submitted by three laboratories as different sequences with different names: RTP (accession no. D87953; http://www.ncbi.nih.gov/ GenBank), a homocysteine-respondent gene in vascular endothelial cells (Kokame et al. 1996); DRG1 (GenBank accession no. X92845), a gene upregulated during colon epithelial cell differentiation (Van et al. 1997); and CAP43 (GenBank accession no. AF004162), a gene specifically induced by [Ni.sup.2 ] compounds (Zhou et al. 1998). All three sequences are identical and represent the same gene. Microarrays are built using individual sequences or clones that are annotated in this fashion, and thus identifying microarray elements (i.e., spots) on a single array or on different arrays that represent a certain gene can be a frustrating exercise.
Our approach to annotate microarray elements makes use of two public databases: UniGene (http://www. ncbi.nih.gov/ UniGene/; Wheeler et al. 2000) and LocusLink (http://www.ncbi.nih.gov/ LocusLink/; Pruitt and Maglott 2001). Whereas UniGene is an experimental system for grouping GenBank sequences (http://www.ncbi.nih.gov/GenBank/) into gene-oriented clusters, LocusLink is a database of curated sequence and descriptive information about genetic loci. Together these resources allow us to map a given microarray element to a certain gene, using UniGene and the GenBank accession number of the element, and to annotate that gene using LocusLink information. Furthermore, the process for doing so is automated with a computer script that can be run on a regular basis to make use of current database information. Although our approach appears to be similar to that taken by the DRAGON database (http://pevsnerlab. kennedykrieger.org/dragon.htm; Bouton and Pevsner 2000) and the DAVID software (http://apps1.niaid.nih.gov/david/ upload.asp) (Dennis et al. 2003), ours seeks to create a single best annotation for a sequence and, based upon this hierarchical process, to generate a cross-chip ID. Although there are caveats to this approach, the results show that it generally allows for intra- and interplatform identification of microarray elements representing a single gene. This approach has been applied to comparing results generated in the multi-laboratory genomics research program coordinated by the International Life Sciences Institute (ILSI) Health and Environmental Sciences Institute (HESI) Committee on the Application of Genomics to Mechanism-Based Risk Assessment.
Materials and Methods
Algorithm rationale. Most developers of microarrays, either private or commercial (e.g., Affymetrix, Inc., Santa Clara, CA) will provide for each array element (i.e., probe) a GenBank accession number indicating the sequence or clone that the element represents or is derived from. On the other hand, the descriptive information for such GenBank entries or the locus that they are associated with may change as new information is deposited in the public databases, especially UniGene and LocusLink. Furthermore, UniGene and LocusLink can serve as sequence "Rosetta stones" where a) UniGene serves to collate accession numbers, b) UniGene integrates with LocusLink, c) LocusLink serves as a curated annotation database with canonical gene names and curated gene information, and a) LocusLink integrates with other information such as OMIM (Online Mendelian Inheritance in Man). To represent the best information for a particular microarray element, a cross-chip ID (XChipID) can be created based upon UniGene and LocusLink information, as described below.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Health Articles
Most Recent Health Publications
Most Popular Health Articles
- Make running easier: with this unique 'pose running' technique, you'll learn to actually enjoy your fat-burning sessions
- 50 home remedies that work: these safe, fast, and effective fixes will relieve what ails you - Cover Story
- Detox in 7 days: a detoux diet can help you shed up to 10 pounds and leave you feeling terrific. Our weeklong plan shows you how to lose the weight and keep it off - Cover story
- Treat sinusitis naturally: breath easy and relieve sinus pressure with these remedies - Quick Fixes and Long-Term Solutions
- All about nightshades: explore the hidden hazards of your favorite food with macrobiotic nutritionist Lino Stanchich


