The most influential paper Gerard Salton never wrote

Library Trends, Spring, 2004 by David Dubin

EARLY REACTIONS TO THE VECTOR MODEL

Responses and counterproposals to the vector model from 1979 onward are each interesting on their own terms and for their own reasons (Koll, 1979; Wong et al., 1986, 1985; Bollmann-Sdorra & Raghavan, 1998). But understanding how they shaped the understanding of the VSM itself requires attention to two issues:

1. Respondents did not realize how recently the VSM came to be characterized as a retrieval model. Looking back at the earlier illustrative vector models of similarity and relevance feedback computations, they assumed the VSM went back at least as far as the 1975 TDV papers discussed above.

2. The IR modeling issues are no longer distinct from the computational modeling issues, as they were in 1968.

The most significant early response to Salton was Wong and Raghavan's "Vector Space Model for Information Retrieval: A Reevaluation" (1984). This paper pointed to inconsistencies in earlier proposals for defining vector correlation. It is the first in a series that would propose a different method for using word co-occurrence data to define an orthogonal basis for a vector space; Wong and Raghavan called it the Generalized Vector Space Model (GVSM) (Wong et al., 1985; Raghavan & Wong, 1986; Wong et al., 1987).

Beyond these contributions it is interesting to look at how Wong and Raghavan interpreted Salton's earlier writings and to see the impact of this interpretation on how we conceive of the VSM today. Reviewing the 1960s and 1970s publications, Wong and Raghavan suggest that Sahon's vectors are informal, notational devices and not intended as a logical tool. They accuse Salton of ignoring issues such as whether the algebraic axioms defining a vector space are even satisfied. According to Wong and Raghavan (1984), that amounts to "casual flirtings" (p. 170) with the concept of vector spaces and should not be taken seriously. These criticisms are understandable in light of how they are interpreting the earlier publications.

As stated earlier, in the pre-1979 writings, vectors are used for modeling term frequency observations and for explaining similarity and relevance feedback computations. Salton's vector spaces are rigorous and formally correct, but the vector models themselves are illustrative (not merely notational). The axioms defining a vector space are satisfied simply because at the algebraic level the vector space in question is the familiar Euclidean space of real numbers. The orthogonality of the basis follows from definition, since what a vector space represents is nothing more than how computations are performed by a system such as SMART.

Wong and Raghavan are looking back with the assumption that the VSM has been an IR model all along. From that perspective, they reasonably ask whether the VSM implies a vector space in the formal sense. But in reality the formality of the vector space was never in doubt, only what was meant by an IR model.

Wong and Raghavan's GVSM is a perfectly reasonable proposal for using word co-occurrence data in an IR system. But they present it as a formal model for vector correlation and orthogonality in IR. The issues of dependencies and patterns in textual data take a back seat to questions of how linear dependence, projection, and correlation are defined. What began as an illustrative formalism came to significantly shape the way theoretical questions were expressed and the language in which solutions were proposed.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)