Graphical model-based gene clustering and metagene expression analysis

Adrian Dobra, Quanli Wang and Mike West

Duke University

July 2004

We describe a novel gene expression analysis method for the creation of overlapping gene clusters and associated metagene signatures that aim to characterize the dominant common expression patterns within each cluster. The analysis is based on the use of statistical graphical models to identify and estimate patterns of association among gene subsets from gene expression data, and then clustering is based formal estimates of very sparse covariance matrices arising from these models. Metagene summaries, which are of interest as reduced dimensional summaries for phenotyping studies, are simply the resulting model-based estimates of dominant singular factors (principal components) of population variance matrices within resulting overlapping clusters. We describe connections between graph-theoretic approaches to exploring gene expression graphical models and exploration in biological contexts of gene subsets represented by identified metagenes, illustrating some aspects of the utility of this framework for summary representation of observational gene expression data.

Keywords: Clustering; Gaussian graphical models; Gene expression data


The manuscript is available in PostScript and PDF formats.