Can document-genre metadata improve information access to large digital collections?
Library Trends, Fall, 2003 by Kevin Crowston, Barbara H. Kwasnik
ABSTRACT
WE DISCUSS THE ISSUES OF RESOLVING the information-retrieval problem in large digital collections through the identification and use of document genres. Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity of genres that are undifferentiated by obvious clues as to their identity. Also, because most genres are characterized by both form and purpose, identifying the genre of a document provides information as to the document's purpose and its fit to the user's situation, which can be otherwise difficult to assess. We begin by outlining the possible role of genre identification in the information-retrieval process. Our assumption is that genre identification would enhance searching, first because we know that topic alone is not enough to define an information problem and, second, because search results containing genre information would be more easily understandable. Next, we discuss how information professionals have traditionally tackled the issues of representing genre in settings where topical representation is the norm. Finally, we address the issues of studying the efficacy of identifying genre in large digital collections. Because genre is often an implicit notion, studying it in a systematic way presents many problems. We outline a research protocol that would provide guidance for identifying Web document genres, for observing how genre is used in searching and evaluating search results, and finally for representing and visualizing genres.
INTRODUCTION
Current computerized information-access systems face a fundamental limitation: they know what documents say but not what they mean or for what purposes they might be useful. Extracting and representing the meaning of documents is difficult and time consuming, and automatic systems still have significant limitations. We note, though, that humans rarely have to read every word of a document to understand its purpose. Instead, people take a shortcut: they start by identifying the kinds of documents they are faced with (i.e., the document's genre), and then use different types of documents in appropriate ways. For example, a grant proposal is used differently from a syllabus, a product brochure, or a bank statement. Accordingly, differences in an information situation are often reflected in the kind of document that is considered helpful (e.g., a problem set, a lesson plan, and a tutorial about mathematics are all about math but useful in different situations). Information-access systems would be more useful for many tasks if they could similarly distinguish the purpose of documents and handle them in appropriate ways.
In this paper we discuss the possibility of improving information access in large digital collections through the identification and use of document genre as a facet of document and query representation. First, we provide some historical background on the concept of genre and the approach it provides to the problem of incorporating context into information retrieval. We outline the framework of the information-retrieval problem with respect to genre and some traditional resolutions that have been attempted. Finally, we outline a research agenda that addresses some of the questions and issues that investigating genre entails.
THEORY: DOCUMENT GENRE
Rhetoricians since Aristotle have attempted to classify communications with similar form or purpose into types or "genres." Numerous definitions of genre, or discourse type, have been suggested (e.g., Longacre, 1983; Miller, 1984; Swales, 1990). In our discussion, we draw on the definition of genre proposed by Orlikowski and Yates (1994), who describe genre as "a distinctive type of communicative action, characterized by a socially recognized communicative purpose and common aspects of form" (p. 543). For instance, this document is an example of the journal article genre. It has a form familiar to most researchers and practitioners and is monitored by the journal's editorial policies as well as the profession's communication practices. There are many document genres: some common, such as a report or a newsletter, and others restricted to specific domains, such as the course syllabus or a problem set in higher education. Genre is applicable to electronic as well as physical documents. For example, in a study of Web documents, Crowston and Williams (2000) were able to identify documents of many familiar genres and of a few genres that seemed to be new to the Web, such as the home page (Dillon & Gushrowski, 2000) or the hotlist.
Genre is useful because it makes documents more easily recognizable and understandable to recipients, thus reducing the cognitive load of processing them (Bartlett, 1932 [1967]).Yates and Sumner (1997) argue that, on the Web genres help in both the production and consumption of documents because genre adds "fixity" in a medium that does not otherwise distinguish very well between text types (say, a book and a Post-it). In our preliminary studies of people searching the Web (Roussinov, Crowston, Nilan, Kwasnik, Liu, & Cai, 2001), we observed that the genre of the document was one of the clues used in assessing document relevance, value, quality, and usefulness.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column
- Living by the word



