Support User Manuals

IBM 15 Switch User Manual

Open as PDF

of 270

231

Performance Considerations for Streams and Nodes

The following operations cannot be performed in most databases. They should be placed in the

stream after t

he operations in the preceding list:

 Operations on any nondatabase data, such as ﬂat ﬁles

 Merge by orde

r

 Balance

 Distinct op e

rations in discard mode or wh ere on ly a subset of ﬁelds are selected as distinct

 Any operation that requires ac cessing data from records other than th e one being processed

 State and cou

nt ﬁeld deriv ations

 History node operations

 Operations i

nvolving “@” (time-series) function s

 Type-checking modes Warn and Abort

 Model constru

ction, applica tion, and analysis

Note: Decision trees, rulesets, linear regression, and factor-generated models can generate

SQL and can the

refore be pushed back to the database.

 Data output to anywhere other than the same database that is processing the data

Node Caches

To optimize stream running, you can set u p a cache on any nonterminal node. When you set up a

cache on a node

, the cache is ﬁlled with the data that passes through the node the next time you

run the data stream. From then on, the data is read from the cache (which is stored on disk in a

temporary directory) rather than from the data source.

Caching is mo

st useful following a time-consuming operation such as a sort, merge, or

aggregation. For example, suppose that you have a source node set to read sales data from a

database and an Aggregate node that summarizes sales by location. You can set up a cache on the

Aggregate n

ode rather than on the so urce n ode because y ou want the cache to store the aggregated

data rather than the entire data set.

Note: Caching at source nodes, which simply stores a copy of the original data as it is read into

IBM® SPSS® M

odeler, will not improve performance in most circums tances.

Nodes with caching enabled are displayed with a small document icon at the top right corner.

When the data is cached at the node, the document icon is green.

previous next