Chemical predictions have gained ground in the last decade as a way to automate the streamlining of chemical reactivity of multiple substrates. This procedure requires the modeling of interatomic potentials, which can be done by fitting these potentials to data obtained at the quantum-mechanical level. Therefore, the aim of this work is to propose GapTrain.jl, a fast, automatic and broad model to develop the Gaussian approximation potential based on a hundred or thousand data.
Machine learning approaches have revolutionized force field-based simulations and can be implemented for the entire periodic table. Within small chemical subspaces, models can be achieved using neural networks (NNs), kernel-based methods such as the Gaussian Approximation Potential (GAP) framework or gradient-domain machine learning (GDML), and linear fitting with properly chosen basis functions, each with different data requirements and transferability. GAPs have been used to study a range of elemental, multicomponent inorganic, gas-phase organic molecular, and more recently condensed-phase systems, such as methane and phosphorus. These potentials, while accurate, have required considerable computational effort and human oversight. Indeed, condensed-phase NN and GAP fitting approaches typically require several thousand reference (“ground truth”) evaluations.
In the present work – with a view to developing potentials to simulate solution phase reactions – we consider bulk water as a test case and develop a strategy which requires just hundreds of total ground truth evaluations and no a priori knowledge of the system, apart from the molecular composition. We show how this methodology is directly transferable to different chemical systems in the gas phase as well as in implicit and explicit solvent, focusing on the applicability to a range of scenarios that are relevant in computational chemistry.
</div>1 D. Frenkel and B. Smit, Understanding Molecular Simulation: From Algorithms to Applications, Academic Press, Cambridge, Massachusetts, 2nd edn, 2002.
2 K. Lindorff-Larsen, P. Maragakis, S. Piana, M. P. Eastwood, R. O. Dror and D. E. Shaw, PLoS One, 2012, 7, e32131.
3 R. Iimie, P. Minary and M. E. Tuckerman, Proc. Natl. Acad. Sci. U. S. A., 2005, 102, 6654–6659.
4 F. No ́e, A. Tkatchenko, K.-R. M ̈uller and C. Clementi, Annu. Rev. Phys. Chem., 2020, 71, 361–390.
5 T. Mueller, A. Hernandez and C. Wang, J. Chem. Phys., 2020, 152, 050902.
6 O. T. Unke, D. Koner, S. Patra, S. K ̈aser and M. Meuwly, Mach. Learn. Sci. Technol., 2020, 1, 013001.
7 R. Z. Khaliullin, H. Eshet, T. D. K ̈uhne, J. Behler and M. Parrinello, Nat. Mater., 2011, 10, 693–697.
8 G. C. Sosso, G. Miceli, S. Caravati, F. Giberti, J. Behler and M. Bernasconi, J. Phys. Chem. Lett., 2013, 4, 4241–4246.
9 H. Niu, L. Bonati, P. M. Piaggi and M. Parrinello, Nat. Commun., 2020, 11, 2654.