Signficance testing when using multiple imputation
for missing data and disclosure limitation

Satkartar K. Kinney and Jerome P. Reiter

Duke University

January 2007

Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. This can be coupled with multiple imputation for missing data in a two-stage imputation approach. First the agency fills in the missing data to generate m completed datasets, then replaces sensitive or identifying values in each completed dataset with n imputed values. In this article, we propose significance tests for multi-component hypotheses with such multiply-imputed datasets. The performance of these tests is illustrated with simulation studies.

Keywords: Confidentiality; Disclosure; Multiple Imputation; Nonresponse; Significance tests; Synthetic Data


The manuscript is available in PostScript and PDF formats.