Featured White Papers
- Oct. 14th: Simplified IT with Software-as-a-Service (SaaS) (ZDNet)
- PCI DSS therapy for the smaller retailer (McAfee)
- The rise of Web commuting (Citrix Online)
Access to biomedical information: the Unified Medical Language System
Library Trends, Summer, 1993 by Steven J. Squires
Building the Metathesaurus
The means by which the selected thesauri were integrated incorporated semi-automated lexical matching combined with knowledge of the relationships among terms explicit in the structures of the source vocabularies. Tuttle et al. (1988, 1989), Sperzel and Tuttle (1989), and Sherertz et al. (1989b) demonstrate the utility of automated lexical matching for finding equivalencies among a diverse set of vocabularies. Machine versions of the source vocabularies were obtained and the terms from those vocabularies were expressed in a uniform manner to facilitate lexical matching among them. The terms from the first set of source vocabularies were compared and a single preferred term, or canonical term, was selected for any identical terms, lexically variant terms, or lexically variant synonyms of terms. Lexical variants can be terms that are different only because of case, number, word order, spelling, or punctuation. When terms from different source vocabularies were found to be identical, or only lexically variant forms of one another, the preferred term for the Metathesaurus entry was established by an order of precedence. If a term from among a set of identical or lexically variant terms was a MeSH term, that term became the Meta entry. The vocabularies following MeSH in order of precedence were DWM-IIIR, SNOMED, ICD, CPT, LCSH, and COSTAR. Once a canonical term was determined, other terms from the set of equivalent terms could be labeled as lexical variants, synonyms, or lexical variants of synonyms.
Though the term relationships and information about terms that result from the processes described earlier and the human editing that followed are stored in several relational database files, a database management system could be devised to present all the information about a single term as a comprehensive entry or record for that term. This conceptual record structure of Meta would then consist of entries for concepts with fields or slots that contain terms related to the concepts or that describe or name attributes of concepts. Tuttle et al. (1989) enumerated the essential slots, as follows:
Concept Name
Meta Unique Code(s)
Syntactic Category (part of speech)
Lexical Tag (if term is an abbreviation, acronym, etc.)
Semantic Type (assigned from Semantic Network)
Source Vocabulary or Vocabularies
Source Hierarchical Contexts
Source Definition(s)
Lexical Variants
Synonyms
Related Terms
Broader Terms
Narrower Terms
Other attributes of terms include use data, described later, data necessary for thesaurus maintenance, and, if the Meta term is a MsSH term, up to twenty-five data elements derived from the annotations in the MESH vocabulary.
After the first version of Meta was compiled, the result was subjected to human editing, described by Sperzel et al. (1990). Semantic types and lexical categories were assigned at this step. Editors also evaluated the automated assignments of synonyms, related terms, broader or narrower terms, and lexical variants if these appeared obviously incorrect. The results of human editing had to have their own audit trail, so that new versions of the Metathesaurus computed from updated versions of the original source vocabularies would have the desired result (Sherertz et al., 1990). Tuttle et al. (1992) warn local users of Meta about the consequences of adding local terms to it, since these enhancements would have to be maintained over new releases. He calls for a standard updating method generally adopted that would facilitate both local maintenance and Meta improvement.