The most influential paper Gerard Salton never wrote
Library Trends, Spring, 2004 by David Dubin
This discussion is noteworthy for two reasons. First, the orthogonality assumption is described as applying to the term vectors (rather than some unspecified basis as in the 1979 article). Secondly, it is another telling example of the retrieval/computational model confusion. On the one hand, the authors correctly express a retrieval model issue, that is, the decision to treat words as unrelated. They acknowledge that dependencies known to exist between words in texts are not represented, measured, or used by the system. Salton and McGill understand the impact that this might have on retrieval results and explain why they choose to dismiss that concern.
On the other hand, the authors describe this decision in terms of a vector orthogonality assumption. As explained earlier, term vector orthogonality is not an assumption but rather a fact resulting from definition. Indeed, it is not even accurate to describe the retrieval model as depending on an assumption of term independence; the SMART system makes no probabilistic inference that could be falsified but merely computes document/query similarity in particular ways. (2) This characterization of SMART is another unfortunate consequence of seeing vector spaces as an IR model. As mentioned earlier, it invited Wong and Raghavan to question Salton's theoretical rigor the following year.
Salton's 1989 book, Automatic Text Processing, includes the author's first full description of the VSM as an IR model. Ironically, much of the characterization is adapted directly from Wong and Raghavan's earlier criticism of what they interpreted Salton to have meant. The illustration of the document space in chapter 10 is an exact copy of figure 1 in Wong and Raghavan's 1984 paper (and their 1986 follow-up) and depicts the term vectors at oblique angles to one another rather than at right angles as in the 1975 TDV papers. Based on Wong and Raghavan's criticism, Salton corrects an earlier (1979) error on the use of term and document correlations to define an orthogonal basis and follows their example in calling for additional information to define the correlations. Citing Raghavan and Wong, Salton repeats the 1983 mischaracterization that term vector orthogonality implies an assumption of term independence.
[FIGURE 1 OMITTED]
EPILOGUE: THE PAPER SALTON NEVER WROTE
As one would expect, published references to the vector model are usually much briefer than the detailed responses, extensions, and alternative proposals discussed above. An author may state, for example, that his or her experimental system realizes or is based on the VSM. Or the VSM may simply be included in a list of other models or formalisms.
It is ironic that in these references the most popular citations for the VSM seem to be the two TDV papers, the 1983 text, and the 1971 collection of SMART system articles. These choices are understandable: the CACM article was suggestively titled, and both it and the JASIS article included the same evocative illustration for figure 1. The 1971 text concerns SMART, the design of which largely defined the loose bundle of operational assumptions and expectations that people associate with systems based on the VSM. The 1983 book by Salton and McGill included descriptions that made it clear that the abstract and computational modeling issues that had been kept distinct in 1968 were by then inextricably intertwined.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column
- Living by the word


