37
Understanding Data Mining
The Self-Learning Response Model (SLRM) node enables you to build a model in
which a single new case, or small number of new cases, can be used to reestimate the
model wit hout having to retrain the model using all data.
The Time Series node estimates exponential smoothing, univariate Autoregressive
Integrated Moving Average (ARIMA), and mul t i variate ARIMA (or transfer function)
models for time series dat a and produ ces forecasts of future performance. A Time
Series node must always be preced ed by a Time Intervals no de.
The k-Nearest Neighbor ( K N N ) node associates a new case with the category or value
of the k objects nearest to it in the predictor space, where k is an integer. Similar cases
are near each other and diss i milar cases are distant from each other.
Association Models
Association models find patterns in your data where one or more entities (such as even ts ,
purchases, or attributes) are associated with one or more other entities. The models constru ct rule
sets that define these relationships . Her e the fields within the data can act as both inputs and
targets. You could find these associations manually, but associa tion rule algorithms do so much
more q uickly, and can explore more complex patterns. A priori and Carma models are examples of
the use of such algorithms. One other type of association model is a sequence detection model,
which finds sequential patterns in time-structured data.
Associa tion models are most useful wh en predicting multiple outcomes—for example, customers
who bought product X also bought Y and Z. Association models associate a particular conclus ion
(such as the decision to buy something) with a set of conditions. The advantage of association rule
algorithms over the more standard decisio n tree algorithms (C5.0 an d C&RT) is t hat associations
can exist between any of the attributes. A decision tree algorithm will build rules with only a
single conclusion, whereas associatio n algorithms attempt to find many rules, each of which may
have a different conclusion.
Association nodes
The Apriori node extracts a set of rules from the data, pulling out the rules with
the highest information content. Apriori offers five different methods of selecting
rules and uses a sophisticated indexing scheme to process large data sets efficiently.
For large problem s, Apriori is generally faster to train; it has no arbitrary limit on
the number of rules that can be retained, and it can handle rules with up to 32