Web Services for Controlled Vocabularies

Bulletin of the American Society for Information Science & Technology, Jun/Jul 2006 by Vizine-Goetz, Diane, Houghton, Andrew, Childress, Eric

Amid the debates about whether folksonomies will supplant controlled vocabularies and whether the Library of Congress Subject Headings (LCSH) and Dewey Decimal Classification (DDC) system have outlived their usefulness, libraries, museums and other organizations continue to require efficient, effective access to controlled vocabularies for creating consistent metadata for their collections

In this article, we present an approach for using Web services to interact with controlled vocabularies. Services are implemented within a service-oriented architecture (SOA) framework. SOA is an approach to distributed computing where services are loosely coupled and discoverable on the network. A set of experimental services for controlled vocabularies is provided through the Microsoft Office (MS) Research task pane (a small window or sidebar that opens up next to Internet Explorer (IE) and other Microsoft Office applications). The research task pane is a built-in feature of IE when MS Office 2003 is loaded. The research pane enables a user to take advantage of a number of research and reference services accessible over the Internet. Web browsers, such as Mozilla Firefox and Opera, also provide sidebars which could be used to deliver similar, loosely-coupled Web services.

Controlled Vocabularies

Many controlled vocabularies are available for representing the content of documents and other resources. Depending upon their needs, catalogers, metadata specialists and web-masters can choose from vocabularies containing just a few terms, such as the Dublin Core Metadata Initiative (DCMI) Type Vocabulary, to large vocabularies containing many thousands of terms, such as the National Library of Medicine's Medical Subject Headings (MeSH) or the Library of Congress Subject Headings. As part of the Terminology Services (TS) project, OCLC researchers are prototyping Web services for various types of knowledge organization schemes, including classification data, subject heading systems, thesauri and lists of form and genre terms. Over the last 18 months, OCLC Research has made the following controlled vocabularies accessible via one or more Web services:

* DCMI Type Vocabulary

* Guidelines on Subject Access to Individual Works of Fiction, Drama, etc. (GSAFD) list of form/genre headings

* Library of Congress Subject Headings (LCSH)

* Library of Congress Annotated Card Program AC Subject Headings (LCSHac)

* Medical Subject Headings (MeSH) 2005

* Medical Subject Headings (MeSH) 2005 Sample

* Medical Subject Headings (MeSH) 2006

* Newspaper Genre List (NGL)

* Radio Form/Genre Terms Guide (RADFG)

* R�pertoire de vedettes-mati�re (RVM) (access restricted)

* Union List of Artist Names (ULAN) Sample

For the project, all of the controlled vocabularies were encoded in the MARC 21 Format for Authority Data in XML. The MARC 21 Authority Format was chosen because it enabled us to code common controlled vocabulary elements, such as preferred and non-preferred terms, term relationships, term mappings, the source of the content and the origin of changes. For some vocabularies it was first necessary to convert the controlled vocabulary data from word processing documents or HTML pages to more structured data formats and then into MARC 21. A sample term from the DCMI Type vocabulary, originally available only as HTML and Resource Description Framework (RDF), is shown in MARC 21 in XML in Figure 1.

The DCMI Type Vocabulary is a controlled list of terms that can be used as values for the DCMI Resource Type element to identify the genre of a resource. Data field tag "040" subfield code "a" contains the MARC organization code for DCMI, the originator of the content; subfield code "c" contains the code for OCLC Research, the party responsible for converting the content to the MARC format. The genre term Image is coded in tag "155" and the associated genre term Still Image is coded in tag "555."

For vocabularies already available in MARC 21, the conversion to MARCXML was a relatively straightforward process. Some problems were encountered with XML and XSLT tools when processing the larger vocabularies (more than 100,000 records) especially after the files were enhanced with the vocabulary's full reference structure, term mappings and links to external Web sites. Once coded as XML the data could be used as the basis for Web services. SKOS (Simple Knowledge Organization) core, an emerging RDF schema for thesauri and related knowledge organization schemes, and the Zthes 0.5 schema, a z39.50 profile for thesaurus navigation, are also suitable formats for encoding vocabulary resources for Web services. Phase III of the High-Level Thesaurus (HILT) project is an example of a project that is using the SKOS core for encoding controlled vocabularies and classification data. MARC and Zthes formats may be added to HILT at a later stage. The Zthes 0.5 encoding for DCMI Type value Image is shown in Figure 2.

User Interface

The implementation of Web services support in many widely adopted platforms presents opportunities to offer terminology Web services in various modular arrangements. OCLC Research is making a set of services for controlled vocabularies available through the Microsoft Office Research task pane. To use the OCLC TS pilot vocabularies, users add OCLC services to the research pane via a URL provided to pilot participants. Within the research pane, pilot users can search a given vocabulary, display information about a term, follow links to associated terms within a vocabulary and follow links to external Web sites. Because the pilot implementation is intended to be used alongside the user's cataloging or metadata editing application, multiple copy and paste operations are provided. Users can insert controlled vocabulary terms with MARC field tags, indicators and subfield codes into MARC catalog records, or for non-MARC applications, users can insert terms as strings into their records without MARC coding.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest