IBM 15 Switch User Manual


 
33
Understanding Data Mining
Figure 4-1
CRISP-DM process model
The six phases inclu de:
Business understanding.
This is perha ps the most import an t phase of data mining. Busine ss
understanding includes determining business objectives, assessing the situ ation, determining
data mining goals, and producing a project plan.
Data understanding.
Data provides the “raw materials” of da ta mining. This phase addresses
the need to understand what your data resources are and the characteristics of those resources.
It includes collectin g initial data, describing data, exploring data, and verifying data quality.
The Data Audit node available from the Output nodes palette is an indispensable tool for
data understanding.
Data preparation.
After cataloging your data resources, you will need to prepare your data for
mining. Preparations include selecting, cleaning, constructing, integrating, and formatting
data.
Modeling.
This is, of course, the ashy part of data mining, where sophisticated analysis
methods are used to extract informatio n from the data. Thi s phas e involves selecting modeling
techniques, generating test designs, and building and assessing models.
Evaluation.
Once you have chosen your models , you are ready to evaluate h ow the data mining
results can help you to achieve your business objectives. Elements of this phase include
evaluati ng r esults, reviewing the data mining process, and determining the next steps.
Deployment.
Now that you have invested all of this effort, it is t ime to r eap the benets. This
phase focuses on integrating your new knowledge into your everyday business processes to
solve your original business p r oblem. This ph ase includes plan deployment, monitoring and
maintenance, producing a nal report, and reviewing the pr oject.
There are some key points in this process model. First, while there is a general tendency for the
process to ow through the steps in the order outlined in the previous paragraphs, there are also a
number of places where the phases inuence each other in a nonlinear way. For example, data
preparation usually precede s modeling. However, decisions made and information gathe r ed
during the modeling phase can often lead you to rethink parts of the data preparation phase, which
can then present new modeling issues. The two phases feed back on each other until both phases