Business Services Industry

Organizing moving image collections for the digital era: the goldspeil report

Information Outlook, August, 2002 by James M. Turner, Michele Hudon, Yves Devin

Several of the questions in our questionnaire had to do with the lexical content of the thesauri (e.g., total number of terms, of terms that were not descriptors, of proper names and so on). Unfortunately, it was next to impossible to collect precise data about this type of information from the participating institutions. As it turned out, most of the thesauri are managed by proprietary software that was unable to generate statistics useful to us. Because of this, the figures given in table 4 are mostly estimates derived from lexical samples taken from the thesauri to which we were given access. The figures given are thus presented as only an attempt at an indication of the great variation in the number of terms found in the tools in use.

We observe the important proportion of terms (almost a third in each tool) that are proper nouns (names of persons, of institutions, or of geographical places). We also note the small proportion of terms other than descriptors. From this we can deduce that the tightening of vocabulary by synonym control has not taken place and that the efficiency of the tools for retrieval is thus weakened. Terms that are included in the various thesauri come from a number of sources, such as general and specialized reference sources, user queries and existing semantic networks such as those found in other thesauri.

Most of the thesauri in use in the participating organizations had an explicit relational structure connecting descriptors by relations of equivalence, of hierarchy and of various kinds of associations. Only one had cross-language associations (a bilingual English-French thesaurus; all others used only English), but all six had hierarchical and associative relationships, and four of the six had some control of synonyms. The fact that only four of the six thesauri used some kind of control over conceptual and terminological equivalence again suggests that semantic control is only partial and so it is probably somewhat ineffective. However, the similar structures suggest that norms for thesaurus construction were nevertheless considered.

The effectiveness of tools for vocabulary management can only be maintained if the lexical and relational content is kept up-to-date. The responses to our question concerning the frequency and the regularity of updates showed that for three of the six thesauri, this was done as needed, changes being immediate and integrated dynamically into the database. In the other three cases, one was updated daily, one weekly and one irregularly.

For three of the thesauri, a single person was responsible for updating the thesaurus and for making decisions about controlling and expanding the semantic networks. In the case of another thesaurus, two people were responsible for its upkeep, and for the two remaining thesauri all the users contributed to the updating operations. Formal procedures or guidelines for updating these tools were not always available.

Updating a thesaurus has largely to do with creating new descriptors. Responses to our question about the number of new descriptors added annually were rather surprising. Half of the thesauri were increased by a maximum of 50 new descriptors annually, while the other half were increased by more than 300. We might wonder about the causes for the disparity in these tools which, conceptually at least, should be rather similar. However, managers of the thesauri we studied were unable to say with any certainty which proportion of the terminology included at the time the data was collected had been included by the end of the first, the third and the fifth years of the existence of the thesaurus, nor at what moment the rate of term creation had leveled off and attained its present level. While it may be fairly clear that the number of terms necessary for indexing a general collection of moving images reaches a peak after which only few new terms need to be added, the data we obtained does not permit us to identify w here that peak is situated.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale