IBM 15 Switch User Manual


 
231
Performance Considerations for Streams and Nodes
The following operations cannot be performed in most databases. They should be placed in the
stream after t
he operations in the preceding list:
Operations on any nondatabase data, such as at les
Merge by orde
r
Balance
Distinct op e
rations in discard mode or wh ere on ly a subset of elds are selected as distinct
Any operation that requires ac cessing data from records other than th e one being processed
State and cou
nt eld deriv ations
History node operations
Operations i
nvolving @ (time-series) function s
Type-checking modes Warn and Abort
Model constru
ction, applica tion, and analysis
Note: Decision trees, rulesets, linear regression, and factor-generated models can generate
SQL and can the
refore be pushed back to the database.
Data output to anywhere other than the same database that is processing the data
Node Caches
To optimize stream running, you can set u p a cache on any nonterminal node. When you set up a
cache on a node
, the cache is lled with the data that passes through the node the next time you
run the data stream. From then on, the data is read from the cache (which is stored on disk in a
temporary directory) rather than from the data source.
Caching is mo
st useful following a time-consuming operation such as a sort, merge, or
aggregation. For example, suppose that you have a source node set to read sales data from a
database and an Aggregate node that summarizes sales by location. You can set up a cache on the
Aggregate n
ode rather than on the so urce n ode because y ou want the cache to store the aggregated
data rather than the entire data set.
Note: Caching at source nodes, which simply stores a copy of the original data as it is read into
IBM® SPSS® M
odeler, will not improve performance in most circums tances.
Nodes with caching enabled are displayed with a small document icon at the top right corner.
When the data is cached at the node, the document icon is green.