Bayesian inference and error modeling for experimental data

All experimental assay data contains error arising from uncertainties in initial compositions, dispensed masses or volumes, measurement noise, model fitting error, and intrinsic biological variability. Accounting for this error to produce a reliable estimate of the uncertainty of experimentally-derived quantities is critical, as this is the basis for testing hypotheses or building predictive models, but it is often difficult to even identify the dominant sources of assay error, let alone propagate them.

 

Bootstrap modeling of a simple dilution series generated using disposable tips, washable tips, and acoustic dispensing demonstrates how strikingly different bias and variance profiles emerge. Hanson et al. JCAMD 29:1073, 2015.

Bootstrap modeling of a simple dilution series generated using disposable tips, washable tips, and acoustic dispensing demonstrates how strikingly different bias and variance profiles emerge.
Hanson et al. JCAMD 29:1073, 2015.

Our lab uses two primary tools to both build predictive models of assay error and incorporate all sources of error and uncertainty in data analysis: the bootstrap principle and Bayesian inference. The bootstrap principle allows us to simulate sources of error and uncertainty in experimental assays, which provides not only a means for optimizing assay configurations and ensuring that the resulting data will meet uncertainty objectives, but also provides a way to assign meaningful experimental uncertainties to assay data without error estimates for which primary data are unavailable. Bayesian inference provides a powerful set of tools for incorporating all known sources of error and uncertainty (such as compound purity, dispensing errors, measurement errors, and uncertainty in the underlying biophysical binding model), as well as model selection, assay optimization based on expected information gain, and characterization of confidence intervals. We use the same fundamental computational strategies for sampling from Bayesian posteriors---Markov chain Monte Carlo (MCMC)---as we use in molecular simulations. Many of the same advanced sampling techniques we have developed to make molecular simulations more efficient---such as Gibbs sampling replica exchange techniques, nonequilibrium candidate Monte Carlo (NCMC), and multiensemble reweighting techniques like MBAR---can be employed to make sampling from the Bayesian posterior highly efficient.

Current projects in this area utilize Bayesian inference for the analysis of absorbance and fluorescence data collected from standard laboratory plate readers with the goal of rigorously characterizing confidence intervals in kinase inhibitor binding assays; Bayesian analysis of isothermal titration calorimetry (ITC) to accurately capture joint uncertainties in thermodynamic parameters of protein-ligand interactions; and Bayesian inference of single-molecule experiments where small-number statistics can lead to large uncertainties in some kinetic parameters.

SOFTWARE

assaytools: Bayesian biophysical analysis of absorbance and fluorescence assay data from common plate readers [experimental]
bayesian-itc: Bayesian analysis of isothermal titration calorimetry (ITC) data [experimental]
bhmm: Bayesian hidden Markov model toolkit for analysis of single-molecule experiments and molecular simulation data

RESOURCES

assaytools: general API for describing experimental assays in a human- and computer-readable format

COLLABORATORS

David D. L. Minh (Illinois Institute of Technology): Bayesian modeling of isothermal titration calorimetry (ITC)
Frank Noé (Freie Universität Berlin): Bayesian hidden Markov modeling of single-molecule experiments

PERSONNEL

Sonya M. Hanson (postdoctoral fellow): Bootstrap modeling and Bayesian inference for fluorescence assays
Ariën S. (Bas) Rustenburg (PBSB graduate student): Bayesian inference for isothermal titration calorimetry (ITC)
Chaya Stern (TPCB graduate student): Bayesian modeling of single-molecule experiments

SELECTED PUBLICATIONS

Modeling error in experimental assays using the bootstrap principle: Understanding discrepancies between assays using different dispensing technologies
Sonya M. Hanson, Sean Ekins, and John D. Chodera.
Journal of Computer Aided Molecular Design 29:1073, 2015. [DOI] [PDF] // IPython notebook [GitHub] // preprint: [bioRxiv]

Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty
John D. Chodera, Phillip Elms, Frank Noé, Bettina Keller, Christian M. Kaiser, Aaron Ewall-Wice, Susan Marqusee, Carlos Bustamante, and Nina Singhal Hinrichs.
preprint: [arXiv] // used in our 2011 Science paper