The Role of Classification in Knowledge Representation and Discovery - 1

Library Trends, Summer, 1999 by Barbara H. Kwasnik

Care must be taken, however, in making use of tree representations to ensure that the correct attributes are drawn upon in making inferences. This problem becomes clearer in another part/whole example (see Figure 4).

Figure 4. Part/Whole Relationship.

AUTOMOBILE
      BODY
      ENGINE BLOCK
             PISTONS
             VALVES
      INTERIOR
              
              

VALVES are part of the ENGINE BLOCK, but the nature of VALVES is distinct from the nature of PISTONS, and it would be incorrect to assume (despite their sibling position in the classification) that they share many attributes in common the way wolves and dogs do. In fact, VALVES and PISTONS are not similar entities at all. They share the attribute of being part of the ENGINE BLOCK, but that is only a partial explanation of what they are--their essence. It would not be sufficient knowledge for most practical purposes the way knowing that dogs and wolves are closely related might prove useful. So, trees have the following formal requirements:

* Complete and comprehensive information. Just like in a hierarchy, the entities that will be included in a tree must be decided in advance. First, it must be decided what will constitute an entity. Knowledge about the entities must be relatively complete in order to decide on the scope of the classification and the important criteria of distinction.

* Systematic and predictable rules for distinction. The general structure of a tree is determined by the relationships among the entities. Part/ whole relationships might be appropriate for some knowledge, while other relationships (such as cause/effect; starting point/outcome; process/product; and so on) might be appropriate for other types of knowledge representation. These relationships should be ones that best reveal the knowledge of the domain--that is, the way in which all the entities interact with each other.

* Citation order. In both hierarchies and trees it is important to decide the order in which rules of distinction will be invoked. The most important of these decisions is the "first cut" because this determines the shape and eventually the representational eloquence of the classification. If the first cut is a trivial one, the rest of the tree becomes awkward and does not reflect knowledge very well. For example, in the biological classification of animals (a hierarchy), the first cut is: has a backbone/does not have a backbone (vertebrate/invertebrate). While this cut produces a very skewed distribution in terms of numbers of species (there are many times more invertebrate species than vertebrates), the resulting classification proceeds smoothly down the subdivisions and is able to cluster many attributes that "make sense" with respect to what we know about fundamental qualities of animals. In trees, the determination of an appropriate citation order is all the more important because trees are essentially descriptive, and the picture they present will depend on the first branching. For instance, in the AUTOMOBILE example presented above, it would be possible to make the first division BACK OF CAR/FRONT OF CAR/MIDDLE OF CAR, and proceed to decompose those sections into their component parts. But would this make sense? Would it present a reasonable division of an automobile's components? Would it help us with knowledge about cars? Perhaps for someone in some context. There is no easy answer to what constitutes a meaningful division, and the decision often rests on consensual models or tradition.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)