Bayesian Factor Regression Models in the ``Large p, Small n'' Paradigm

Mike West
ISDS, Duke University

June 2002

I discuss Bayesian factor regression models and prediction with very many explanatory variables. Such problems arise in many areas; my motivating applications are in studies of gene expression in functional genomics. I first discuss empirical factor (principal components) regression, and the use of general classes of shrinkage priors, with an example. These models raise foundational questions for Bayesians, and related practical issues, due to the use of design-dependent priors and the need to recover inferences on the effects of the original, high-dimensional predictors. I then discuss latent factor models for high-dimensional variables, and regression approaches in which low-dimensional latent factors are the predictor variables. These models generalise empirical factor regression, provide for more incisive evaluation of factor structure underlying high-dimensional predictors, and resolve the philosophical and practical issues in empirical factor models by casting the latter as limiting special cases. Finally, I turn to questions of prior specification in these models, and introduce sparse latent factor models to induce substantively relevant structure in the high-dimensional distributions of predictors. Embedding such sparse latent factor models in factor regressions provides a novel approach to variable selection with very many predictors. The paper concludes with an example of sparse factor analysis of gene expression data and comments about further research.

Keywords:

Dimension reduction, gene expression analysis, high-dimensional covariates, latent factor models, shrinkage priors


The manuscript is available in pdf format