The most influential paper Gerard Salton never wrote
Library Trends, Spring, 2004 by David Dubin
It is at this last data-centric level that one should understand the use of vector abstractions in most of Salton's IR publications: vector components represent raw or modified observations, and relations between vectors (such as the cosine of the angle between pairs of them) are devices for explaining computations or other design choices about how an IR system operates. As we shall see, the habit of describing data and computations in terms of operations on vectors eventually became so familiar that some later interpretations seem to lose sight of the role the vector model was intended to play.
EARLIEST EXAMPLES
The elements of what would come to be known as the VSM are evident in Sahon's earliest publications on experimental IR and also the work of other authors (Switzer, 1965; Sammon, 1968). In a 1963 article in the Journal of the Association for Computing Machinery (JACM), Salton describes systems and methods for what at that time he calls "associative document retrieval techniques." Building on earlier work by people such as H. P. Luhn, Salton outlines the architecture for automated systems that extract words from machine-readable texts, select a subset of those words deemed significant enough to represent the document content, and compute measures of association between pairs of terms, pairs of documents, and between documents and queries.
Even in this early paper one finds frequencies of extracted words presented using matrix and vector notation and the cosine of angles between vectors recommended as a measure of association. The vector representation is employed to describe similarities computed using both extracted words and citation data. Furthermore, it is clear that vector representations are to be understood precisely at the data-centric level described above: the term-document matrix is called an incidence matrix, leaving no doubt that what the vector components model are observations. The similarity measures are at all points described as methods or operations on the data that can be interpreted as relations between vectors.
SMART was the system Salton developed over the course of his career as an IR researcher. More than just an IR system, SMART was the working expression of Sahon's theories and the experimental environment in which those theories were evaluated and tested (Salton, 1971). The earliest papers describing the SMART system show that the same extraction and association procedures outlined in the JACM article are central to SMART's design and operation (Salton, 1965b; Salton & Lesk, 1965). In 1965 Salton published a paper in IEEE Spectrum titled "Progress in Automatic Information Retrieval" (1965a). That article discusses specific features of SMART and characterizes document representations and similarity computations in terms of vectors. In addition, relevance feedback experiments (conducted by J. J. Rocchio) are described in terms of query vector modifications. In all these examples, the vector spaces illustrate how computations such as similarity measures and relevance feedback are applied to the data; the vector spaces are models of computations executed by the system.
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Reference Articles
- A Maryland state trooper gave Erik Bonstrom an $80 ticket for driving too slowly
- In California, postal worker Dean Hudson has been found guilty
- Alec Loorz, the 15-year-old founder of Kids vs. Global Warming and recent Brower Youth Award recipient, went to Congress in November for a press conference with Senators Barbara Boxer and John Kerry, who are championing legislation to stabilize US greenho
- Foreign exchange
- The buzz on bees
Most Recent Reference Publications
Most Popular Reference Articles
- Credit card debt on college campuses: causes, consequences, and solutions
- 9 questions to ask your new lover: what you were afraid to ask, but always wanted to know
- A world without nuclear weapons?
- How Tyler Perry rose from homelessness to a $5 million mansion
- Rejoice anyway - Zephaniah 3:14-20, Philippians 4:4-7 - Living by the Word - Column



