Multiple Imputation by Ordered Monotone Blocks with Application to the Anthrax Vaccine Research Program

Fan Li, Michela Baccini, Fabrizia Mealli, Constantine E Frangakis, Elizabeth R Zell, Donald B Rubin

Duke University, University of Florence, Johns Hopkins University, CDC, Harvard Univerisity

November, 2011

Multiple imputation has become a standard statistical technique for imputing missing values, where imputations are created as random draws from the posterior predictive distribution of the missing data. The Anthrax Vaccine Adsorbed trial data created new challenges for multiple imputation due to the large number of variables of different types and the limited sample size. An intuitive method for handling such complex data is to specify, for each variable with missing values, a univariate conditional distribution given all other variables, in the form of a regression model. Such univariate imputation strategies are valid for monotone missing data, but have the theoretical drawback that the fully conditional distributions are generally incompatible when missing data are not monotone. Aiming at reducing incompatibility, we propose the ``multiple imputation by ordered monotone blocks" approach to extend the theory for monotone patterns to arbitrary missing patterns. The key idea is to break an arbitrary missing pattern into a collection of smaller but monotone missing patterns. We apply this strategy to impute the missing data in the Anthrax Vaccine Adsorbed trial and evaluate its performance by a novel simulation-based approach. A method for creating missing values in the simulated data sets, which mimics the observed missing data patterns, is also proposed.

Keywords: Bayesian, conditional distribution, evaluation, imputation, incompatibility, missing data, monotone blocks.


The manuscript is available in PDF formats.