BAYESIAN ANALYSIS OF BINARY PREDICTION TREE MODELS FOR RETROSPECTIVELY
SAMPLED OUTCOMES
Jennifer Pittman, Erich Huang, Quanli Wang, Joseph R Nevins and Mike West
Duke University
Classification tree models are flexible analysis tools which have the
ability to evaluate interactions among predictors as well as generate
predictions for responses of interest.
We present a Bayesian approach to classification
tree analysis in the specific context of a binary response with
potentially very many candidate predictors and in which the data arise
from a retrospective case-control design. This scenario is common in studies concerning
gene expression data, which is a key motivating example context.
The design issues are incorporated into the
tree models via the use of underlying Dirichlet
process priors on the distributions of predictor variables
conditional on the response. This prior model influences the
generation of trees through Bayes' factor based tests of association
that determine significant binary partitions of nodes during a process of forward
generation of trees. We describe this constructive process and discuss
questions of generating and combining multiple trees via Bayesian
model averaging for prediction. Additional discussion of parameter selection
and sensitivity is given in the context of two examples,
one of which concerns prediction of breast tumour status utilizing
high-dimensional gene expression data; the examples
demonstrate the exploratory/explanatory uses of such models as well
as their primary utility in prediction.