Can document-genre metadata improve information access to large digital collections?

Library Trends, Fall, 2003 by Kevin Crowston, Barbara H. Kwasnik

ABSTRACT

WE DISCUSS THE ISSUES OF RESOLVING the information-retrieval problem in large digital collections through the identification and use of document genres. Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity of genres that are undifferentiated by obvious clues as to their identity. Also, because most genres are characterized by both form and purpose, identifying the genre of a document provides information as to the document's purpose and its fit to the user's situation, which can be otherwise difficult to assess. We begin by outlining the possible role of genre identification in the information-retrieval process. Our assumption is that genre identification would enhance searching, first because we know that topic alone is not enough to define an information problem and, second, because search results containing genre information would be more easily understandable. Next, we discuss how information professionals have traditionally tackled the issues of representing genre in settings where topical representation is the norm. Finally, we address the issues of studying the efficacy of identifying genre in large digital collections. Because genre is often an implicit notion, studying it in a systematic way presents many problems. We outline a research protocol that would provide guidance for identifying Web document genres, for observing how genre is used in searching and evaluating search results, and finally for representing and visualizing genres.

INTRODUCTION

Current computerized information-access systems face a fundamental limitation: they know what documents say but not what they mean or for what purposes they might be useful. Extracting and representing the meaning of documents is difficult and time consuming, and automatic systems still have significant limitations. We note, though, that humans rarely have to read every word of a document to understand its purpose. Instead, people take a shortcut: they start by identifying the kinds of documents they are faced with (i.e., the document's genre), and then use different types of documents in appropriate ways. For example, a grant proposal is used differently from a syllabus, a product brochure, or a bank statement. Accordingly, differences in an information situation are often reflected in the kind of document that is considered helpful (e.g., a problem set, a lesson plan, and a tutorial about mathematics are all about math but useful in different situations). Information-access systems would be more useful for many tasks if they could similarly distinguish the purpose of documents and handle them in appropriate ways.

In this paper we discuss the possibility of improving information access in large digital collections through the identification and use of document genre as a facet of document and query representation. First, we provide some historical background on the concept of genre and the approach it provides to the problem of incorporating context into information retrieval. We outline the framework of the information-retrieval problem with respect to genre and some traditional resolutions that have been attempted. Finally, we outline a research agenda that addresses some of the questions and issues that investigating genre entails.

THEORY: DOCUMENT GENRE

Rhetoricians since Aristotle have attempted to classify communications with similar form or purpose into types or "genres." Numerous definitions of genre, or discourse type, have been suggested (e.g., Longacre, 1983; Miller, 1984; Swales, 1990). In our discussion, we draw on the definition of genre proposed by Orlikowski and Yates (1994), who describe genre as "a distinctive type of communicative action, characterized by a socially recognized communicative purpose and common aspects of form" (p. 543). For instance, this document is an example of the journal article genre. It has a form familiar to most researchers and practitioners and is monitored by the journal's editorial policies as well as the profession's communication practices. There are many document genres: some common, such as a report or a newsletter, and others restricted to specific domains, such as the course syllabus or a problem set in higher education. Genre is applicable to electronic as well as physical documents. For example, in a study of Web documents, Crowston and Williams (2000) were able to identify documents of many familiar genres and of a few genres that seemed to be new to the Web, such as the home page (Dillon & Gushrowski, 2000) or the hotlist.

Genre is useful because it makes documents more easily recognizable and understandable to recipients, thus reducing the cognitive load of processing them (Bartlett, 1932 [1967]).Yates and Sumner (1997) argue that, on the Web genres help in both the production and consumption of documents because genre adds "fixity" in a medium that does not otherwise distinguish very well between text types (say, a book and a Post-it). In our preliminary studies of people searching the Web (Roussinov, Crowston, Nilan, Kwasnik, Liu, & Cai, 2001), we observed that the genre of the document was one of the clues used in assessing document relevance, value, quality, and usefulness.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale