BAYESIAN ANALYSIS OF BINARY PREDICTION TREE MODELS FOR RETROSPECTIVELY SAMPLED OUTCOMES

Jennifer Pittman, Erich Huang, Quanli Wang, Joseph R Nevins and Mike West
Duke University

Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We present a Bayesian approach to classification tree analysis in the specific context of a binary response with potentially very many candidate predictors and in which the data arise from a retrospective case-control design. This scenario is common in studies concerning gene expression data, which is a key motivating example context. The design issues are incorporated into the tree models via the use of underlying Dirichlet process priors on the distributions of predictor variables conditional on the response. This prior model influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of two examples, one of which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the examples demonstrate the exploratory/explanatory uses of such models as well as their primary utility in prediction.


The manuscript is available in pdf format