advertisement
On CHOW: The perfect summer DRINK
Find Articles in:
all
Business
Reference
Technology
News
Sports
Health
Autos
Arts
Home & Garden

Content provided in partnership with
Thomson / Gale

A survey of metadata research for organizing the web

Library Trends,  Fall, 2003  by Jane L. Hunter

ABSTRACT

THIS ARTICLE ATTEMPTS TO PROVIDE an overview of the key metadata research issues and the current projects and initiatives that are investigating methods and developing technologies aimed at improving our ability to discover, access, retrieve, and assimilate information on the Internet through the use of metadata.

1. INTRODUCTION

The rapid expansion of the Internet has led to a demand for systems and tools that can satisfy the more sophisticated requirements for storing, managing, searching, accessing, retrieving, sharing, and tracking complex resources of many different formats and media types.

Most Popular Articles in Reference
The importance of understanding organizational culture
Credit card attitudes and behaviors of college students
What factors attract foreign direct investment?
Libraries Need Relationship Marketing - mutual interest marketing concept, ...
How to set performance goals: employee reviews are more than annual critiques
More »
advertisement

Metadata is the value-added information that documents the administrative, descriptive, preservation, technical, and usage history and characteristics associated with resources. It provides the underlying foundation upon which digital asset management systems rely to provide fast, precise access to relevant resources across networks and between organizations. The metadata required to describe the highly heterogeneous, mixed-media objects on the Internet is infinitely more complex than simple metadata for resource discovery of textual documents through a library database. The problems and costs associated with generating and exploiting such metadata are correspondingly magnified.

Metadata standards, such as Dublin Core, provide a limited level of interoperability between systems and organizations to enable simple resource discovery. But, there are still many problems and issues that remain to be solved. Cory Doctorow (2001) believes that the vision of an Internet in which everyone describes their goods, services, or information using concise, accurate, and common or standardized metadata that is universally understood by both machines and humans is a "pipe-dream, founded on self-delusion, herd hubris and hysterically inflated market opportunities." Other people cite the popularity and efficiency of Google as an example of an extremely successful search engine that does not depend on expensive and unreliable metadata. Google combines PageRanking (in which the relative importance of a document is measured by the member of links to it) with sophisticated text-matching techniques to retrieve precise, relevant, and comprehensive search results (Brin & Page, 1998).

Some of the major disadvantages of metadata are cost, unreliability, subjectivity, lack of authentication, and lack of interoperability with respect to syntax, semantics, vocabularies, languages, and underlying models. However, there are many researchers currently investigating strategies to overcome different aspects of these limitations in an effort to provide more efficient means of organizing content on the Internet. Other researchers are investigating metadata to describe the new types of real-time streaming content being generated by emerging broadband and wireless applications to enable both push and pull of this content based on users' needs. The goal of this article is to provide an overview of some of the key metadata research underway that is expected to improve our ability to search, discover, retrieve, and assimilate relevant information on the Internet regardless of the domain or format.

2. THE KEY RESEARCH AREAS

In this section I have identified what I consider to be some of the key metadata research areas, both now and over the next few years. The following subsections provide a brief description of the work being undertaken and some key citations for each of the research areas summarized in the list below:

* Extensible Markup Language (XML)--XML and its associated technologies--XML Namespaces, XML Query languages, and XML Databases--are enabling implementers to develop metadata application profiles (XML Schemas) that combine metadata terms from different namespaces to satisfy the needs of a particular community or application. Large-scale XML descriptions of content are being stored in XML Databases and can be queried using XML Query Language. These are key technologies to enabling the automated computer processing, integration, and exchange of information.

* Semantic Web technologies--"The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation" (Berners-Lee, Hendler, & Lassila, 2001). There are two main building blocks for the semantic Web:

** Formal languages--RDF (Resource Description Framework), DAML+OIL, and OWL (Web Ontology Language), which is being developed by the Web Ontology Working Group of the W3C.

** Ontologies--communities will use the formal languages to define both domain-specific ontologies and top-level ontologies to enable relationships between ontologies to be determined for cross-domain searching, exchange, and information integration.

* Web Services--using open standards such as WSML, UDDI, and SOAP, Web services will enable the building of software applications without having to know who the users are, where they are, or anything else about them.