June 2009
A factor modeling framework is developed that is both predictive of phenotypic or response variation and the inferred factors offer insight with respect to underlying physical or biological processes. The method is general and can be applied to a variety of scientific problems. We focus on modeling complex disease phenotypes (etiology of cancer) as a motivating example. In this setting, the factors capture gene or protein interaction networks at different scales -- breadth of the interaction network. The method integrates multiscale analysis on graphs and manifolds developed in applied harmonic analysis with sparse factor models, a mainstay of applied statistics. Specific findings include the association of the TGF-$\beta$ pathway with prostate cancer recurrence mediated by cell-cycle control and the implication of the p27 pathway in cancer progression. In silico perturbation analyses of the inferred multiscale model suggest that the TGF-$\beta$ pathway is a dominant pathway in control of cell-cycle deregulation in prostate cancer.
Keywords: diffusion geometry, sparse regression, molecular networks, factor models
The manuscript is available in PDF formats.