Salvaging Information Engineering Techniques In A Data Warehouse Environment - Industry Trend or Event

Computer Technology Review, Feb, 2001 by Anthony L. Politano

During the 1980s and early 90s, Information Engineering (IE) was in its prime. Most major corporations were utilizing some form of system development methodology that, could be tied back to IE. The first step in any IE project was the Information Strategy Plan or ISP. The ISP would look at the data, process, organization, technology, and interactions of an enterprise. The ISP was top-down analysis at its best. Three key deliverables of an ISP were a data model, functional decomposition, and an interaction (CRUD) matrix. The data model was an entity relationship diagram that encompassed the entire enterprise. The functional decomposition diagram would examine the business functions and decompose to a process level. The CRUD matrix, which stands for CREATE, READ, UPDATE, DELETE, examined the interaction of data and process. These three deliverables provided a basis for top down analysis.

The mid-90s saw the rise of data warehousing and its related disciplines. One aspect of data warehousing is the data mart. A data mart is a single subject area data warehouse usually developed to support a single business unit. The data warehouse market has realized that there must be a balancing act between the "build it and they will come" approach to data warehousing and the single subject legacy data marts or "legamarts".

The salvaging of some information engineering techniques can provide the contextual models and top down requirements needed to create architected data warehouses. Since many people in a leadership or architecture role in data warehousing are descendents of the information engineering age, they should be familiar with the CRUD matrix and its techniques.

This article will describe three techniques that can utilize existing information engineering in a data warehouse project. First, the entity relationship diagram and its use in a three phase data model approach. Second, the functional, decomposition diagram and its use in segmenting and defining key performance indicators and dimensions. Third, creating a modified. CRUD matrix that deals with logical entities and current systems.

The Entity Relationship Diagram

The entity relationship diagram is the standard data technique for creating data models. The entity relationship diagram enables an analyst to create a graphical view of the data concepts of an organization and their relationships. Traditional system development dictates creation of an entity relationship (Entity Relationship) diagram that is converted to a database design of a relational database.

In a data warehouse environment the traditional normalized Entity Relationship cannot be easily translated into a database design. By nature a normalized Entity Relationship diagram tends to separate the data concepts into separate entities. A traditional approach to Entity Relationship modeling is concerned with three concepts: entities, relationships, and attributes.

Components Of The Entity Relationship Diagram

Entity. A data concept which has relevance to the enterprise. An entity can be a person, place, thing, or concept. Typically an entity consists of a single identifiable concept such as EMPLOYEE, STUDENT, CLASS, PURCHASE ORDER, or SHIPMENT. An entity can consist of subtypes. Subtypes are a decomposition of an entity into its various types. For example an EMPLOYEE entity can be modeled with subtypes FULL-TIME and PART-TIME. Subtyping is necessary when clarity is required about the data (and to some respect, the behavior) of the Supertype entity.

Relationship. A relationship is a description about the relationship that exists between two entities. Information about how the entities relate, in particular, the optionality and cardinality of the relationship is modeled. A relationship should only be modeled when the relationship has relevance. If one desired, any entity could loosely be related to any other entity, but this is not the intention of modeling relationships. A special relationship, known as a recursive relationship, exists between and entity and itself, such as an EMPLOYEE to EMPLOYEE related by a REPORTS TO relationship.

Attributes. Attributes are details about a specific entity. These details provide greater clarification about the data that can or will be captured regarding an entity. One must be careful not to confuse entities and attributes. Entities can exist without attributes, but attributes cannot exist without entities.

Data Modeling For A Data Warehouse

There has been significant work done on utilizing specialized data modeling techniques for data warehousing. In particular, the dimensional approach has been adopted to model data warehouses for a relational database. With a dimensional modeling approach, many of the traditional normalization techniques are not utilized. Instead, the model utilized a mixed approach of highly normalized portions of the model and highly denormalized parts of the model. The model is centered on two types of entities, facts and dimensions.

Facts are entities that deal with measurements or indicators. A fact entity for a sales organization could measure revenue per month, or units sold per day. A fact for a manufacturing organization could measure defects per lot per day or units produced per week.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale