The most influential paper Gerard Salton never wrote
Library Trends, Spring, 2004 by David Dubin
VECTOR SPACES AND MATHEMATICAL MODELS
We begin with a description of the VSM that Salton included in chapter 10 of his 1989 book on automatic text processing. That treatment includes the following characterization:
1. The VSM (like the Boolean and probabilistic models) represents information retrieval systems and procedures.
2. Global measures of similarity (such as the cosine measure) are computed between queries and documents.
3. Queries and documents are represented by term sets.
4. Both queries and documents can then be represented as ordered term vectors.
5. The components of the vectors are numbers representing either the importance of a term or simply the presence or absence of a term (1 or 0, respectively).
As mentioned above, the origins of these features are considerably earlier than the publications usually credited with the definition of the VSM. Salton himself did not publish a full articulation of the VSM as a retrieval model until this chapter, however, which appeared years after he was publicly credited with having invented the VSM.
The VSM is a mathematical model. Generalizing a definition by Rutherford Aris, Davis and Hersh (1981) define a mathematical model as a consistent mathematical structure designed to correspond to some physical, biological, social, psychological, or conceptual entity. They cite a number of uses for mathematical models, including:
1. predicting events in the physical world
2. guiding observation or experimentation
3. fostering conceptual understanding
4. assisting the "axiomatization of the physical situation" (Davis & Hersh, 1981, p. 78)
5. promoting progress in mathematics
So there are any number of ways in which the VSM might represent an advance for or contribution to IR research or systems design. Clarifying the particular role it plays as a model recommends a closer look at how vector representations are used to model other domains. The vector space is a very general and flexible abstraction, used to model many different domains and applications. When one makes the claim that a system or phenomenon is or can be modeled by a vector space, the first question one must consider is the level of abstraction at which that claim is being made:
Algebraic--At the most abstract level, it can be a claim about addition and multiplication operations defined on a nonempty set of objects. Specifically, the claim that these operations satisfy all the algebraic axioms for a vector space (for example, addition commutes, multiplication distributes over addition, etc.). An example of a claim at this level is that the set of polynomials of degree no greater than n define a vector space (Lay, 1994). Measurement-theoretic--At another level, to say that something is represented by a vector space can be an empirical claim that two or more variables define a space. In that case, the substance of the claim is about ordinal and additive relations holding among the values of those variables for some known entities (that is, that the variables are quantitative) and also that distance between the entities is a function of the differences along each of the individual variables defining the space (Michell, 1990). Physical--Real vector spaces are often used to model physical forces such as gravity and relations such as velocity. For example, the direction and velocity of a boat may be represented by a vector, the speed and direction of the current is represented by a second vector, and the course and speed made good are shown to be the sum of those vectors (Fraleigh & Beauregard, 1987). Models such as these entail claims about the physical world. Data-centric--In multivariate analysis, vector spaces are used to model a set of observations. The data is typically represented as a matrix where items or cases are represented as rows and observations for a particular feature are represented as columns. Geometrically, the cases are understood to be plotted in the space of feature values, but no empirical claim about the features, the nature, or relations among the values need be advanced: in this case, the vector space is simply a way of presenting the values assigned to the observations. This representation typically precedes a transformation of the data, such as reexpressing them in a space of lower dimensionality in order to reveal latent structures or patterns (Green & Carroll, 1976). In that case, the operations performed using the data can be explained and understood as operations on vectors and matrices.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column
- Living by the word



