Designing a knowledge discovery system, Part 2: now that we have categorized, let's … classify! - Internet

Computer Technology Review, Oct, 2003 by Claude Vogel

Really, all people want to do when they use a search engine, portal, or even a full-blown knowledge management system, is answer a question. They want all information relevant to their question so they can formulate an answer. To do this, knowledge management systems must shift from a "retrieval-" to a "discovery-"based orientation. Next generation knowledge discovery systems will introduce users to a new way of searching information assets by better complementing the user's own cognitive approaches to finding information. These systems must simultaneously manage vast and continually changing stores of information, as well as the idiosyncratic nature of the user.

To handle this two-step requirement, knowledge management system design should be split into two consecutive phases. The first phase should focus on the organization and its need for a maintainable, reliable and universally understandable information repository. The focus in phase one is internal. This phase is dependent upon the proper use of ontologies and taxonomies, as described in Part One of ("A Roadmap to Proper Taxonomy Design," Computer Technology Review, July 2003).

The second design phase (and the focus of this article) is the user-centric, externally focused dynamic classification phase--which, when layered on top of a solid taxonomically based informational foundation, constitutes a powerful, scalable and flexible system capable of complex problem-solving support. The beauty of dynamic classification is this flexibility and ability to adjust in the face of the huge and constantly changing information assets available today.

What's required is a set of tools that help individuals extract small details or serendipitously discover data relationships within the information foundation, in ways that make unique sense to them personally.

What is a Classification?

A classification can be visualized as a tree representation of what is actually a multi-dimensional matrix. "Dynamic" classification is the ability to cross and combine these dimensions--essentially slicing and dicing information as desired and in real-time--to place information into the perspective most meaningful to the user within a unique, time-specific problem-solving context.

It is this specific ability, the user-definable slicing and dicing of data, which supports knowledge discovery versus merely information retrieval.

These information dimensions or trees may be shifted or reversed; dramatically affecting the resulting classification even though the information latched to the categories within a dimension remains unchanged. It is easy to see that these trees are permutable. In the two classifications in Figure 1, the dimensions used are the same. However, when the dimension order shifts, an entirely different perspective is generated.

In Classification 1, we can see diseases within an African context. In Classification 2, we can see the epidemiology of Alzheimer across different geographical contexts.

The ability to shift dimension order is a true benefit of dynamic classification. Individuals need to understand many variables in order to make a good decision--particularly in an urgent situation. Moreover, each individual will go about this process in a different way. For example, if a terrorist attack was imminent the local police, FBI and medical personnel would all want essentially the same information but from their own different perspectives.

Dynamic classifications generally occur in identifiable patterns. These are geography/ topic, horizontal/vertical and vertical/vertical. This is useful to keep in mind as you begin to design your dynamic classification tools and identify the dimensions you will offer you users. Geography is the most commonly used dimension because it is an analytical element of so many decision-making processes. Terrorism in the Philippines or criminal law in Texas or domestic sales, for example, would all involve a geographic tree. An example of horizontal/vertical pattern would be the petroleum business or anti-money laundering regulations. The horizontal tree (business) includes broad categories such as marketing, research, health and safety, etc. By crossing a horizontal category with a vertical dimension such as Petroleum, which may contain such categories as crude oil or solid waste, you would derive categories populated by documents highly specific to that type of business. The other major structure, vertical/vertical, really displays the power of dynamic classification. An example would be "MeSH Proteins" and "MeSH Diseases." In this case you would see all categories containing documents with information matching these two trees.

You can see how quickly this process becomes complex. There are thousands of diseases, and if you cross them with the dimension of proteins alone, you could have millions of possible combinations. If you add a third dimension, chemical compounds, you move quickly into the billions. The virtual space containing your multi-dimensional operation is huge, even when you use only a small part of it.

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale