IBM 15 Switch User Manual


 
102
Chapter 6
Screening or Removing Fields
To screen out elds with too many missing values, you have several options:
You can use a Data Audit node to lter elds based on quality.
You can use a Feature Selection node to screen out elds with more than a specied percentage
of m is sing values and to rank elds based on importance relative to a specied target.
Instead of removing the elds, you can use a Type node to set the eld role to None. This will
keep the elds in the data set but exclude them from the modeling processes.
Imputing or Filling Missing Values
In cases where there are only a few missing values, it ma y be useful to insert values to replace
the blanks. You can do this from th e Data Audit report, which allows you to specify options for
specic elds as appropriate and then gener ate a SuperN ode that imputes values using a number
of m ethods. This is the m ost exible method, and it also allows you to specify handlin g for large
numbers of elds in a single node.
The following methods are available for imputing missing values:
Fixed.
Subs titutes a xed value (either the eld mean, midpoint of the range, or a constant that
you specify).
Random.
Subs titutes a random value based on a normal or uniform distribu tion.
Expression.
Allows you to specify a custom expression. For example, you could replace values
with a global variable created by the Set Globals node.
Algorithm.
Substitutes a value predicted by a model based on the C&RT algorithm. For each eld
imputed using this method, there will be a separate C&RT model, along with a Filler node that
replaces blanks and nulls with t he value predicted by the model. A Filter node is then used to
remove the prediction elds generated by the model.
Alternatively, to coerce values for specic elds, you can use a Type node to ensure that the
eld types cover only legal values and then set the Check column to Coerce for the elds whose
blank values need replacing.
CLEM Functions for Missing Values
There are several functions used to han dle missing values. The following functions are often used
in Select and Filler nodes to discard or ll missing values:
count_nulls(LIST)
@BLANK(FIELD)
@NULL(FIELD)
undef