MIXTURE MODELS IN THE EXPLORATION OF STRUCTURE-ACTIVITY RELATIONSHIPS IN DRUG DESIGN

Susan Paddock, Mike West, Stan Young & Merlise Clyde

October 1997

We report on a study of mixture modelling problems arising in the assessment of chemical structure-activity relationships in drug design and discovery. Pharmaceutical research laboratories developing test compounds for screening synthesise many related candidate compounds by linking together collections of basic molecular building blocks, known as monomers. These compounds are tested for biological activity, feeding in to screening for further analysis and drug design. The tests also provide data relating compound activity to chemical properties and aspects of the structure of associated monomers, and our focus here is studying such relationships as an aid to future monomer selection. The level of chemical activity of compounds is based on the geometry of chemical binding of test compounds to target binding sites on receptor compounds, but the screening tests are unable to identify binding configurations. Hence potentially critical covariate information is missing as a natural latent variable. Resulting statistical models are then mixed with respect to such missing information, so complicating data analysis and inference. This paper reports on a study of a two-monomer, two-binding site framework and associated data. We build structured mixture models that mix linear regression models, predicting chemical effectiveness, with respect to site-binding selection mechanisms. We discuss aspects of modelling and analysis, including problems and pitfalls, and describe results of analyses of a simulated and real data set. In modelling real data, we are led into critical model extensions that introduce hierarchical random effects components to adequately capture heterogeneities in both the site binding mechanisms and in the resulting resulting levels of effectiveness of compounds once bound. Comments on current and potential future directions conclude the report.

This research is partially supported by Glaxo Wellcome, 5 Moore Drive, RTP, NC 27709, and the National Institute of Statistical Sciences (NISS), http://www.niss.org Corresponding author is Susan Paddock, Institute of Statistics and Decision Sciences, Duke University, Durham, NC 27708-0251, USA, http://www.stat.duke.edu

The manuscript is available in either postscript or pdf