Sparse Variational Analysis of Large Longitudinal Data Sets

Artin Armagan, David Dunson

Department of Statistical Science, Duke University

September 2009

It is increasingly common to be faced with longitudinal or multi-level data sets that have large number of predictors and/or a large sample size. Current methods of fitting and inference for mixed effects models tend to perform poorly in such settings. When there are many variables, it is appealing to allow uncertainty in subset selection and to obtain a sparse characterization of the data. Bayesian methods are available to address these goals using Markov chain Monte Carlo (MCMC), but MCMC is very computationally expensive and can be infeasible in large p and/or large n problems. As a fast approximate Bayes solution, we recommend a novel approximation to the posterior relying on variationalmethods. Variational methods are used to approximate the posterior of the parameters in a decomposition of the variance components, with priors chosen to obtain a sparse solution that allows selection of random effects. The method is evaluated through a simulation study, and applied to an epidemiological application.

Keywords: Mixed-effects model, Variational approximations, Shrinkage estimation


The manuscript is available in PDF format.