270
Example 17
exclude only persons whose incomes you do not know. Similarly, in computing the
sample covariance between age and income, you would exclude an observation only if
age is missing or if income is missing. This approach to missing data is sometimes
called pairwise deletion.
A third approach is data imputation, replacing the missing values with some kind
of guess, and then proceeding with a conventional analysis appropriate for complete
data. For example, you might compute the mean income of the persons who reported
their income, and then attribute that income to all persons who did not report their
income. Beale and Little (1975) discuss methods for data imputation, which are
implemented in many statistical packages.
Amos does not use any of these methods. Even in the presence of missing data, it
computes maximum likelihood estimates (Anderson, 1957). For this reason, whenever
you have missing data, you may prefer to use Amos to do a conventional analysis, such as
a simple regression analysis (as in Example 4) or to estimate means (as in Example 13).
It should be mentioned that there is one kind of missing data that Amos cannot deal
with. (Neither can any other general approach to missing data, such as the three
mentioned above.) Sometimes the very fact that a value is missing conveys
information. It could be, for example, that people with very high incomes tend (more
than others) not to answer questions about income. Failure to respond may thus convey
probabilistic information about a person’s income level, beyond the information
already given in the observed data. If this is the case, the approach to missing data that
Amos uses is inapplicable.
Amos assumes that data values that are missing are missing at random. It is not
always easy to know whether this assumption is valid or what it means in practice
(Rubin, 1976). On the other hand, if the missing at random condition is satisfied, Amos
provides estimates that are efficient and consistent. By contrast, the methods
mentioned previously do not provide efficient estimates, and provide consistent
estimates only under the stronger condition that missing data are missing completely
at random (Little and Rubin, 1989).
About the Data
For this example, we have modified the Holzinger and Swineford (1939) data used in
Example 8. The original dataset (in the SPSS Statistics file Grnt_fem.sav) contains the
scores of 73 girls on six tests, for a total of 438 data values. To obtain a dataset with
missing values, each of the 438 data values in Grnt_fem.sav was deleted with
probability 0.30.