The most influential paper Gerard Salton never wrote
Library Trends, Spring, 2004 by David Dubin
Salton may have imagined that the "underlying basis" represented both empirically derived and psychologically real dimensions. For example, an article by Koll, published the same year as Salton's JDoc article, describes a system (called WEIRD) in which a derived vector space is proposed as a solution to the problem of measuring conceptual similarity (Koll, 1979). Alternatively, Salton may have supposed the basis to represent concepts that are neither psychologically real nor derived from data but rather pure abstractions: A few years earlier Salton's future coauthor Michael McGill had published a paper relating SMART to an abstract but informal vector model proposed by Meincke and Atherton (McGill, 1976; Meincke & Atherton, 1979).
The other significance of the orthogonality/correlation issue in 1979 is that it is a special case of a retrieval modeling issue Salton had cited in 1968: relationships between elements of the various sets. The year 1979 saw the first coupling of this abstract modeling issue with vector representations that had been discussed separately in the 1968 book. Furthermore, the earliest characterizations of Salton's VSM as an IR model appeared that year in separate publications by Salton, McGill, and Koll (Salton, 1979; McGill & Huitfeldt, 1979; Koll, 1979).
Koll identifies the basis in Salton's vector model as the index term vectors. Within a few years, Salton would come to agree that it is the index term vectors (not some other basis) that are assumed to be orthogonal in his VSM. But that position is equally problematic: if the basis vectors represent index terms then those vectors are not assumed to be orthogonal, they simply are orthogonal, because all that the vectors represent is the way that term frequency data are used in the system's computations.
When a commentator on the VSM says that term basis vectors are assumed to be orthogonal, this is a misstating of the actual fact that dependencies among words in natural language are ignored. Approaches such as WEIRD and Latent Semantic Indexing do compute and use information about these dependencies, and although SMART's similarity computations never worked that way, there is ample evidence in the writings of Salton and his colleagues that they understood word/term dependencies and conducted many experiments to employ term associations in retrieval (Salton, 1963; Lesk, 1969; Salton, Buckley, & Yu, 1983).
It is a subtle error of language or description to claim that the VSM assumes term vectors are orthogonal. And it is no coincidence that this error first appears when the VSM was first characterized as a retrieval model instead of a computation model. If term vector orthogonality is a simplifying assumption, then that implies the existence of correlated terms independent of their operational definition in the computational design choices. But, as with the "underlying basis" of 1979, it is not clear what those entities could be. Evidently, the familiarity of vector space illustrations has led to a confounding of objective facts (that term dependencies and word associations exist) with implications for how those facts might be modeled (as correlations between vectors in a vector space). In 1968 Salton had included the character of relationships among members of the descriptor set as a retrieval modeling issue. By 1979, discussion of those relationships had become inseparable from discussion of similarity computations. That confusion continued to shape reactions to Salton's contributions over the subsequent years.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- A world without nuclear weapons?
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column
- Medical education's dirtiest secret - use of medical residents




