Automated forcefield benchmarking manuscript on arXiv

Molecular mechanics forcefields are an integral part of molecular simulation.  The quality of any properties computed from molecular simulations is wholly dependent on the quality of the underlying forcefield.  Quantifying how well the forcefields we use can reproduce various physical properties provides insight into expected accuracy in other properties of interest, deficiencies in the forcefield parameters or functional form, and strategies for making systematic improvements.

In a new manuscript posted to arXiv ahead of submission, postdoc Kyle Beauchamp tackles one of the most critical issues in forcefield validation: Most of the physical property information one would like to benchmark against is tied up, inaccessible, in paper databases (also known as "books" or "journal articles").  Using the ThermoML Archive from NIST TRC headed by Kenneth Kroenlein (a coauthor on the paper), Kyle is able to show that this data the computer-readable data stored in this archive in the IUPAC-standard XML-based ThermoML format contains a wealth of information useful for automated validation (and eventually parameterization) of molecular mechanics forcefields.

As usual, all code used in the production of this manuscript is made available through GitHub. The code make use of the excellent OpenEye Toolkit, which is available free for academic use that will generate data for the public domain; the GPU-accelerated OpenMM toolkit, and the free AmberTools distribution.

Kyle A. Beauchamp, Julie M. Behr, Ariën S. Rustenburg, Christopher I. Bayly, Kenneth Kroenlein, and John D. Chodera.
Preprint ahead of submission: [arXiv] [PDF] [GitHub]