IBM 15 Switch User Manual


 
234
Chapter 13
Aggregate.
When the Keys are contiguous option is not set, this node r eads (but does not store) its
entire input d
ata set b efore it produces any aggregated output. In the more e xtreme situations,
where the size of the aggregated data reaches a limit (determined by the SPSS Modeler Server
conguration option Memory usage multiplier), the remainder of the data set is sorted and
processed as
if the Keys are contiguous option were set. When this option is set, no data is stored
because the aggregated output records are produced as the input data is read.
Distinct.
The Distinct node stores all of the unique key elds in the input data set; in cases where
all elds are key elds and all records are unique it stores the entire data set. By defa ult the
Distinc t node sort s the data on the key elds and then selects (or discards) the rst distinct rec ord
from each group. For smaller data sets with a low number of distinct keys, or those that have been
pre-sor ted, you can choose options to improve the speed and efciency of processing.
Type.
In some instances, the Type node caches the input data when re ading values; the cache is
used for downstream processing. The cache requires sufcient disk space to store the entire
data set but speeds up processing.
Evaluation.
The Evaluation node must sort the input data to compute tiles. The sort is repeated for
each model ev aluated because the scores a nd consequent record order are different in each case.
The running time is M*N*log(N), where M is the number of models and N is the number of records .
Performance: Modeling Nodes
Neural Net and Kohonen.
Neural network training algorithms (including the Kohonen algorithm)
make many passes over the training data. The data is sto r ed in memory up to a limit, and the
excess is spilled to disk. Accessing the training data from disk is expensiv e because the access
method is random, which can lead to excessive disk activit y. You can disable the use of disk
storage for these algorithms, forcing all data to b e stored in memory, by selecting the Optimize
for speed option on the Model tab of the node’s dialog box. Note that if the amount of memory
require d to store the data is greater than the working set of the server process, part of it will be
paged to disk and performance will suffer a ccording ly.
When Optimize for memory is enabled, a percentage of physica l RAM is allocated to the
algorithm according to the value of the IBM® SPSS® Modeler Server conguration option
Modeling memory limit percentage. To use more memory for training neural networks, either
provide more RAM or increase the value of this option, but note that setting the value too high
will cause paging.
The runnin g time o f the neural network algorithm s depends on the required level of accuracy.
You can control the running tim e by setting a stopping condition in the node’s dialog box.
K-Means.
The K-Means clustering algorithm has the same options for controlling memory usage
as th e neural network algorithms. Performance on data stored on disk is better, however, because
access to the data is sequential.
Performance: CLEM Expressions
CLEM sequence functions (“@ functions”) that look back into the data stream must sto r e
enough of the data to satisfy the longest look-back. For operations whose degree of look-back
is unbounded, all values of the eld must be stored. An unbounded operation is one where the