Bayesian Factor Regression Models in the ``Large p, Small n'' Paradigm
Mike West
ISDS, Duke University
June 2002
I discuss Bayesian factor regression models and prediction with
very many explanatory variables. Such problems arise in many areas;
my motivating applications are in studies of gene expression in
functional genomics. I first discuss empirical factor (principal
components) regression, and the use of general classes of shrinkage
priors, with an example. These models raise foundational questions
for Bayesians, and related practical issues, due to the use of
design-dependent priors and the need to recover inferences on the
effects of the original, high-dimensional predictors. I then discuss
latent factor models for high-dimensional variables, and regression
approaches in which low-dimensional latent factors are the predictor
variables. These models generalise empirical factor regression,
provide for more incisive evaluation of factor structure underlying
high-dimensional predictors, and resolve the philosophical and
practical issues in empirical factor models by casting the latter
as limiting special cases. Finally, I turn to questions of prior
specification in these models, and introduce sparse latent factor
models to induce substantively relevant structure in the high-dimensional
distributions of predictors. Embedding such sparse latent factor
models in factor regressions provides a novel approach to variable
selection with very many predictors. The paper concludes with an
example of sparse factor analysis of gene expression data and
comments about further research.
Keywords:
Dimension reduction, gene expression analysis, high-dimensional
covariates, latent factor models, shrinkage priors
The manuscript is available in pdf format