Learning Predictive Models of Chemistry from Molecular Dynamics Data


Atomistic simulation of complex chemistry using molecular dynamics generates a wealth of data that has traditionally been underutilized. We develop a scale-bridging framework using molecular dynamics data to build fast kinetic Monte Carlo models of chemical reaction networks. These fast kinetic models allow us to rapidly extrapolate in both time and chemical space, leveraging information from expensive molecular dynamics simulations costing weeks to compute to enable predictions for a large number of related systems in a matter of minutes.

Data-Driven Model Reduction of Nonlinear Dynamical Systems


Nonlinear dynamical systems are at the core of many areas of science, from biology to atmospheric chemistry to electrical circuits. They often consist of hundreds to thousands or more components that interact with each other in complex ways; identifying the key components that govern properties of interest is a computationally challenging task, but the ability to make this type of interpretation is a key step towards scientific understanding and discovery. We have developed a data-driven paradigm for efficient model reduction of nonlinear dynamical systems that can find reduced models for systems with hundreds or more components in a matter of minutes.

Machine Learning with Scientific Data


Datasets in the physical sciences are often very challenging: (1) experimental data is expensive and thus necessarily small in quantity, (2) the existing data is often clustered, since experiments tend to be repeated on related materials and systems, (3) or else it is imbalanced, since only positive results tend to be reported in the literature. Frequently, only datasets consisting of a mix of experimental and computational data with very different levels of fidelity are available. We seek to address the challenge of learning from these datasets by maximally leveraging information available from known physics.