PREDICTION TREE MODELS IN CLINICO-GENOMICS

Jennifer Pittman, Erich Huang, Joseph Nevins and Mike West
Duke University

Classification tree models have ability to discover and evaluate interactions of multiple predictor variables, and define flexible, nonlinear predictive tools. We have developed tree models for clinical prediction studies with very high-dimensional gene expression data as candidate predictors. A first context is Bayesian tree models for predicting binary outcomes (as an example), that respects a retrospective (case-control) sampling design common in gene expression studies. A second context is survival modelling for problems such as disease recurrence. Key issues are approaches to tree construction, multiplicities, sensitivity of tree predictions, and the need to average predictions over multiple candidate models. Some of our disease studies use metagene predictors -- aggregate gene expression signatures from clusters of genes -- with clinical variables. We stress the utility of such tree models for gene and metagene data exploration, and the resulting identification of genes plausibly associated with clinical endpoints, as well as for clinico-genomic prediction.


The manuscript is available