Paper Abstract

Sparse Factor-Analytic Probit Models

P. Richard Hahn
Duke University

James G. Scott
University of Texas at Austin

Carlos M. Carvalho
University of Chicago Booth School of Business

November 2009

We describe a class of sparse factor-analytic probit models for multivariate binomial and multinomial data. These models provide a parsiminious lower-dimensional representation of multivariate categorical data by imposing structure upon the covariance matrix of a latent normal parameter. This confers a number of advantages over traditional models. First, the factor--probit model can be used as a powerful exploratory tool for investigating underlying structure in categorical data. Second, it can be used to create well-behaved shrinkage estimators that make the multivariate probit model viable even when the number of variables is large relative to the number of observations. The use of sparsity priors contributes additional regularization, and also provides a natural probabilistic framework for investigating the number of factors driving the observed covariation. Finally, the factor model offers significant computational gains, as it circumvents the need to sample from a high-dimensional truncated multivariate normal distribution. After describing the model, we study its performance both on simulated data, and on a data set regarding consumer preferences in Scotch whisky that has been previously analyzed in the literature.

We then turn to our motivating example: the analysis of partisanship patterns in sixty years of roll-call votes from the United States Senate. The factor-loadings matrix that emerges from this analysis corresponds to plausible political forces, and the manner in which this matrix changes over time raises interesting questions regarding presidential election cycles. Moreover, the factor scores themselves provide a novel way of ranking senators in terms of the partisanship of their voting patterns.


Available as a PDF.