Analysis of Sample Set Enrichment Scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles

Elena Edelman, Alessandro Porrello, Ran Liu, Bala Balakumaran, Andrea Bild, Phillip G. Febbo, and Sayan Mukherjee

March, 2006

Gene expression profiling experiments in cell lines and animal models characterized by specific genetic or molecular perturbations have yielded sets of genes ``annotated'' by the perturbation. These gene sets can serve as a reference base for interrogating other expression data sets. For example, a new data set in which a specific pathway gene set appears to be enriched, in terms of multiple genes in that set evidencing expression changes, can then be annotated by that reference pathway. We introduce in this paper a formal statistical method to measure the enrichment of each sample in an expression data set. This allows us to assay the natural variation of pathway activity in observed gene expression data sets from clinical cancer and other studies. Validation of the method and illustrations of biological insights gleaned are demonstrated on cell line data, mouse models, and cancer-related datasets. Using oncogenic pathway signatures, we show that gene sets built from the model systems are indeed enriched in the model system. We employ ASSESS for the use of molecular classification by pathways. This provides an accurate classifier that can be interpreted at the level of pathways instead of individual genes. Finally, ASSESS can be used for cross-platform expression models where data on the same type of cancer are integrated over different platforms into a space of enrichment scores.


The manuscript is available as a pdf document