Faceted classification and logical division in information retrieval

Library Trends, Wntr, 2004 by Jack Mills

3.4. Precoordinate indexes

Apart from a few special collections, this was the only form of subject catalog used until the 1950s. The term refers to the handling of compound subjects, which constitute the vast majority in the literature. The constituent terms that in combination (coordination) describe the subject are coordinated in the subject heading or classmark in anticipation of the needs of searchers. Compounding immediately raises the problem of distributed relatives; this problem, absolutely central to shelf order, continues to be central to the organization of the surrogates also, despite their much greater facilities for providing multiple access. How the separate concepts needed to describe the compound subject are linked depends on the relationships subsisting between them, and these, in turn, determine the search strategies for locating the information sought. The problem of distributed relatives that this poses can be ameliorated (but never completely resolved) by making multiple entries for a document with a compound subject so that a separate entry appears directly under each of its major constituent concepts. For example, the document referred to earlier might get a separate entry under each of the four constituent concepts: Skeletal system, Cancer, Therapy, and Radiography (but omitting separate entries for the other twenty permutations theoretically possible). Such permutation is standard practice in libraries using the Universal Decimal Classification (UDC), whose notation particularly provides for it. Such permutation of multiple entries is rarely found in the alphabetical subject catalog. Notably, most subject cataloging takes as its unit a complete and discrete record (a book or article), and its classification and indexing involve a process of summarization. The subject description is of the record as a whole, and this determines its position.

3.5. Postcoordinate indexes

The development of mechanical aids to indexing (e.g., peek-a-boo, machine punched-cards) from the 1920s onward saw the removal of the need to summarize the overall content in a single precoordinated subject description. Now, only single constituent terms were assigned, and their combination to form a search request for the subject concerned was left to the search stage. This system was called postcoordinate indexing since the coordination appeared after the indexing step, requiring less effort since it moved the burden onto the searcher. The absence of recognized relationships could result in ambiguity, e.g., a search for fertilizers for sugar beet by the simple coordination of Sugar and Beet and Fertilizers would also produce documents on the use of sugar-beet tops as fertilizers. This led to the reintroduction of classification at the indexing stage in the form of role indicators and other devices that are implicit in the precoordinate index.

Mechanical aids were soon supplanted by electronic systems, and a still more drastic change in indexing practice followed. With the development of networks for electronic retrieval, the economic burden presented by the prior indexing of individual records (typically, for services operated commercially) became prohibitive. Now, it was not just a case of abandoning the intellectual precoordination of index terms but the abandoning of preindexing altogether. Reliance was to be entirely on keywords found in the record and recognized by electronic searching. Indexing devices developed by librarians can only be used indirectly, by assisting the framing of requests to search engines. The limited discriminatory powers of keywords, with all their attendant ambiguities in the unruly natural language, were now supplemented by new index devices, with machines operating on the relatively raw text of the documents. All of them are based on the measurement of relatively artificial characteristics of documentary texts, such as frequency of occurrence of particular words, contiguity of particular words, etc., using statistical techniques and mathematical algorithms. These are deemed sufficiently correlative to conceptual meanings to form classes allowing searches defined conceptually. They constitute new index devices, but they are still classificatory in operation, establishing subclasses of the total store identified by the parameters of the technique used. They are not assigned by an indexer but must utilize the computer programs of the store's service provider.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale