The most influential paper Gerard Salton never wrote

Library Trends, Spring, 2004 by David Dubin

RETRIEVAL MODELS

In 1968 Salton published Automatic Information Organization and Retrieval, a book that presents a more developed treatment of the concepts introduced in the earlier IR papers and more details on the design and evaluation of the SMART system. Salton devotes chapter 6 entirely to retrieval models but, interestingly, that chapter contains none of the vector or matrix notation seen in the earlier papers. This is not to say that vector representations are absent from the book: as in the earlier writings, they appear in the context of explaining specific computations in the chapters on statistical operations (4) and the retrieval process (7). But for Salton a retrieval model was closer to the formal model later presented by Bookstein and Cooper (Bookstein & Cooper, 1976). (1) Retrieval models, according to this understanding, are more abstract than particular computations. The retrieval operation is understood as a mapping between the space of query words and the space of documents (that is, replacement of the former by the latter). Salton presents retrieval models in set-theoretic terms, though there is no reason why vectors could not be used to model retrieval at the same level of abstraction: John W. Sammon Jr. published an abstract model similar to Salton's using vectors rather than sets (Sammon, 1968). According to Salton, a retrieval model should explicate such issues as

* whether a particular set has a well-defined complement

* whether the request space is identical with the object space; that is, whether the set of possible query descriptions is the same as the set of possible document descriptions

* whether document and query identifiers are unstructured and independent of one another or whether relations between them are defined

* implications of order relations on queries and documents, such as whether a more specific query guarantees the retrieval of fewer documents and whether those will be a proper subset of a more general query

* whether the system contains a classification language (that is, a set of categories distinct from the document description language) and functions to map document and request descriptions into those categories

* whether elements of the description languages are all positive properties, or whether negation can be expressed independent of any other existing property

A retrieval model, according to Salton, represents documents, description features (such as index terms), queries, and the relationships within and across those sets. The vector spaces described in the 1968 book, however, are not models of documents, terms, or queries: they are models of numeric data and of computations with those data. The numbers represent the documents, terms, and queries within a system such as SMART. The vector space models are explanatory devices intended to help the reader understand how part of a system works; the retrieval model speaks to more general questions, such as those listed above.

Some of the retrieval modeling issues have since recurred in disputes that intimately couple them with those of vector representations (as explained below). But in 1968 Salton treated these modeling issues separately from those used to characterize similarity and relevance feedback computations.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale