IBM 15 Manual

A SERVICE OF

next previous

Chapter

4

44

4

Understanding Data Mining

Data Mining Overview

Through a variety of techniques, data mining identiﬁes nuggets of information in bodies of data.

Data mining extracts information in such a way that it can be used in areas such as decision

support, prediction, forecasts, a nd estimation. Data is often vol

uminous but of low v alue and with

little direct usefulness in its raw form. It is the hidden information in the data that has value.

In data mining, success comes from combin ing your (or your expert’s) knowledge of the

data with advanced, active analysis techniques in whic h the compu

ter identiﬁes the underlying

relationships an d features in the data. The process of data mining generates models from historical

data that are later used for predictions, pattern detection, and more. The technique for building

these models is called machine learning or modeling.

Modeling Techniques

IBM® SPSS® Modeler includes a number of machine-lea r ning and modeling technologies, which

can be roughly grouped according to the types of problems they are intended to solve.

 Predictive modelin g methods include decisi on trees, neural networks, and statistical models.

 Cluste r ing models focus on identifying groups of similar records and labeling the records

according to the group to which they belong. Clusterin g methods include Kohonen, k-means,

and TwoStep.

 Associa tion rules associate a particula r conclusion (such as the purchase of a particular

product) with a set of conditions (the purchase of seve r al other products).

 Screening m odels c an be used to screen data to locate ﬁelds and r ecords that are most likely to

be of interest in modeling and identify outliers that may not ﬁt known patterns. Available

methods include feature selection and anomaly detection.

Data Manipulation and Discovery

SPSS Modeler also includes many facilities that let you apply your expe r tise to the data:



Data manipulation.

Constructs new data items derived from existing ones and breaks down the

data into meaningful subset s. Data from a variety o f sources can be me rged and ﬁltered.



Browsing and visualization.

Displays aspects of the data using the Data Audit node to perform

an initial a udit including graphs and statistics. Advanced visualization includes interactive

graphics, which can be exported for inclusio n in project repor ts .



Statistics.

Conﬁrms suspected relationships between variables in the data. Statistics from

IBM® SPSS® Statistics can also be used within SPSS Modeler.



Hypothesis testing.

Constructs models of how the data behaves and veriﬁe s these mo dels.

© Copyright IBM Corporation 1994, 2012.

29