The Role of Classification in Knowledge Representation and Discovery - 1

Library Trends, Summer, 1999 by Barbara H. Kwasnik

Trees are useful knowledge representations for the following reasons:

* Highlight/Display relationship of interest. This is the primary strength of a tree. It lays out the entities comprising a domain in a pattern of classes that highlights or makes evident the important or defining relationships among them.

* Distance. A tree reveals the distances between entities (either physical distances or metaphorical ones). Thus one can determine that a COLONEL is "closer" to a GENERAL than is a PRIVATE, at least along the dimension of chain of command. If entities are components of the same super component, this means they are "closer" in space or in function.

* Relative frequency of entities. This feature of trees is also shared by hierarchies. When entities cluster in large numbers under one classification label, this is frequently an opportunity for the creation or discovery of new rules for distinguishing among them. When a cluster is small and has only a few entities, these entities tend to be treated as if they were all the same. It may be neither feasible nor reasonable to make distinctions among them, and taking account of any differences may not support the enterprise. Once the cluster grows, however, and the number of entities reaches a critical mass, it might be useful to further differentiate them. In such a case it is necessary to discover new knowledge that will suggest the best way of making these finer distinctions. Conversely, when a category consistently has a member of one or just a few, it might signal the need for merging categories and rethinking the logic behind the division in the first place. In this case also, it is necessary to generate new knowledge in order to guide the merging or shifting of the orphan categories.

The use of trees as knowledge representations shares some of the same problems as does the use of hierarchies:

* Rigidity. Because a tree is characterized by the relationships among entities and the citation order, the general shape of the tree--its expressiveness as a knowledge representation--is determined a priori. This means that new entities can be added, if they fit into a place in the structure but, if the new entity or new knowledge does not fit well, the entire structure must be rethought and sometimes rebuilt.

* One-way flow of information. In a hierarchy, information flows in two directions: vertically, between classes, superclasses, and subclasses, and also laterally, between sibling classes (classes sharing the same superclass). In a tree, even if it is a part/whole representation, the information flows in a vertical direction up and down. Siblings in a class may in fact be entirely different types of objects. So there are rules for species but not for differentia. Many people assume that, since Syracuse is in New York State and New York City is also in New York State, that they are similar when in fact they only share the attribute of being in the same state and little else. Syracuse may be more like some other city in another state than it is like New York City. At any rate, the tree classification is not particularly good at representing multidirectional complex relationships.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale