November, 2011
Missing data are pervasive in large public-use databases. Multiple imputation (MI) is an effective methodology to handle the problem. Current state-of-the-art procedures of MI often fit fully Bayesian models assuming some joint probability distribution for the underlying complete data. Though theoretically valid, joint modeling may not accurately capture the important relations among the variables that are outside that theoretical structure. Alternatively, a widely used strategy, ``multiple imputation using chained equations (MICE)'', first specifies a set of univariate conditional models and then iteratively imputes the missing data based on these conditional models. Though practically flexible, MICE defines a possibly incompatible Gibbs sampler (PIGS) and has the inherent problem of possible model incompatibility. We examine and illustrate this problem by simple examples. We then propose a spectrum of imputation strategies, imputing by monotone blocks (IMB), which combines (1) sequential imputation for monotone missing data, (2) and a fully conditional strategy like MICE when (1) cannot be applied. The key is to partition an arbitrary missing data pattern into monotone blocks. We further provide some guidelines for choosing good imputation strategies within this spectrum.
Keywords: conditional imputation; Gibbs sampler; incompatibility; MICE; missing data; monotone block; multiple imputation; sequential.
The manuscript is available in PDF formats.