IBM 15 Switch User Manual


 
39
Understanding Data Mining
Segmentation nodes
The Aut o Cluster node estimates and compares clustering models, which identify
groups of records that have similar characteristics. The node works in the sam e
manner as other automated modeling nodes, allowing you to experiment w i t h multiple
combinations of options in a single modeling pass. Models can be compared using
basic measures with which to attempt to lter and rank the usefulness of the cluster
models, and provide a measure based on the importance of particular elds.
The K-Means node clusters the data set into distinct groups (or clusters). The method
denes a xed number of clusters, iteratively assigns records to clusters, and adjusts
the cluster centers until further renement can no longer improve the model. Instead
of trying to predict an outcome, k-means uses a process known as unsupervised
learning to uncover patterns in the set of input elds.
The Kohonen node generates a type of neural network that can be used to cluster the
data set into distinct groups. When the network is fully trained, records that are
similar shoul d be close together on the output map , while records that are different
will be far apart. You can look at the number of observations captured by each unit
in the model nugget to identify the strong units. This may give you a sense of the
appropriate number of clusters.
The TwoStep node uses a two-step clustering method. The rst step makes a single
pass through the data to comp ress the raw input data into a manageable set of
subclusters. The secon d step uses a hierarchical clustering method t o progressively
merge the subclusters into larger and larger clusters. TwoStep has the advantage of
automatical l y esti mating the optimal number of clusters for the training d at a. It can
handle mix ed eld types and large data sets efciently.
The Anomaly Detection node identies unusual cases, or outliers, that do not conform
to patterns of “normal” data. With this node, it is possible to identify outliers even if
they do not t any previously kn own patterns and even if you are not exactly sure
what you are looking for.
In-Database Mining Models
SPSS Modeler supports integration with data mining and modeling tools that are available from
database vendors, including Oracle Data Miner, IBM DB2 InfoSphere Warehouse, and Microsoft
Analysis Services. You can build, score, and store models inside the database—all from within the
SPSS Modeler application. For full deta ils, see the SPSS Modeler In-Database Mining Guide,
availabl e on the product DVD.
IBM SPSS Statistics Models
If you have a copy of IBM® SPSS® Statistics installed and licensed on your computer, you can
access and run certain SPSS Statistics routines from within SPSS Modeler to build and score
models .
Further Information
Detailed documentation on the mode ling algorithms is also available. For more information, see
the SPSS Modeler Algorithms Guide, available on the product DVD.