IBM 15 Switch User Manual


 
35
Understanding Data Mining
Classication nodes
The Aut o Classier node creates and compares a number o f different models for
binary outcomes (yes or no, churn or do not churn, and so on), allowing you to
choose the best approach for a given analysis. A number of modeling algorithms are
supported, making it possi ble to select the methods you want to use, the specic
options for each, and the criteria for comparing the results. The node generates a set
of models based on the specied options and ranks the best candidates according to
the criteria you specify.
The Aut o Numeric node es t i mates and compares models for continuous numeric
range outcomes using a numb er of different methods. The nod e w orks in the same
manner as the Aut o Classier node, al l owing you to choose the algorithms to use
and to experiment with multiple combinatio ns of options in a single modeling pass.
Supported algorithms include neural networks, C&R Tree, CHAID, linear regression,
generalized linear regression, and support vector machines (SVM). Models can be
compared based on correlation, relative error, or number of variables used.
The Classication and Regression (C&R) Tree node generates a decision tree that
allows you to predict or classify future observations. The m et hod uses recursive
partitioning t o split the training records into segments by minimizing the impurity
at each step, where a node in the tree is considered “pure” if 100% of cases in the
node fall into a specic cat egory of the target eld. Target and input elds can
be numeric ranges or categorical (nominal, ordinal, or ags); all splits are binary
(only two subgroups).
The QUEST node provides a binary classication method for building decision trees,
designed to reduce the processing time required for large C&R Tree analyses while
also reducing the tendency found in classication tree methods to favor inputs that
allow more splits. Input elds can be numeric ranges (continuous), but the target eld
must be categorical. All splits are bin ary.
The CHAID node generates decision trees using chi-square statistics to identify
optimal splits. Unlike the C&R Tree and QUEST nodes, CHAID can generate
nonbinary tree s, meaning that some splits have more than two branches. Target and
input elds can be numeric range (continuou s) or categorical. Exhaustive CHAI D is
a modicat i on of CHAID that does a m ore thorough job of examining all possible
splits but takes lo nger to compute.
The C5.0 node builds either a decision tree o r a rule set. The model works by splitting
the sample based on the eld that provides the max i mum information gain at each
level. The target eld must be categorical. Multiple splits into more than two
subgroups are allowed.
The Decision List node identies subgroups, or segments, that show a higher or lower
likelihood of a given binary outcome relative to the overall population. For example,
you might look for customers who are unlikely to churn or are most likely to respond
favorably to a campaign. You can incorporate your business knowledge into the
model by adding your own custom segments and previewing alternative models side
by side to compare the results. Decision List models consist of a list of rules in which
each rule has a condition and an outcome. Rules are applied in order, and the rst rule
that matches determines the outcome.
Linear regression models predict a continuous target based on li near relationships
between the target and one or more predictors.