October 2011
Gaussian processes are widely used as priors on function spaces in a variety of non-parametric Bayesian methods including density estimation, regression, classification among others. van der Vaart & van Zanten, 2009 showed that rescaling a homogeneous smooth Gaussian field with an appropriate prior on the scaling parameter leads to a minimax-optimal rate of contraction of the posterior distribution that also adapts to the unknown smoothness of the functional parameter. In multidimensional problems, practitioners often use multiple scaling variables with point mass mixture priors to allow a subset of variables to drop out from the covariance kernel. In this article, we establish that if the true function belongs to a H\"{o}lder space of functions of fewer number of variables, then one indeed obtains the minimax optimal rate up to a logarithmic factor for the reduced number of variables if one uses an appropriate class of priors on the different scaling variables. Our prior formulation does not use any information regarding the true number of variables or the unknown smoothness of the function, and hence is fully adaptive. We additionally show that one cannot obtain the minimax optimal rate of convergence using a common bandwidth across all dimensions.
Keywords: Adaptive; Bayesian nonparametrics; Function estimation; Gaussian process; Rate of convergence; Variable selection
The manuscript is available in PDF formats.