Papers
arxiv:2506.14492

Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space

Published on Jun 17
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

The Microsoft Research Accurate Chemistry Collection provides a large dataset of highly accurate total atomization energies to improve data-driven computational chemistry methods.

AI-generated summary

Accurate thermochemical data with sub-chemical accuracy (i.e., within pm1 kcal mol^{-1} from sufficiently accurate experimental or theoretical reference data) is essential for the development and improvement of computational chemistry methods. Challenging thermochemical properties such as heats of formation and total atomization energies (TAEs) are of particular interest because they rigorously test the ability of computational chemistry methods to accurately describe complex chemical transformations involving multiple bond rearrangements. Yet, existing thermochemical datasets that confidently reach this level of accuracy are limited in either size or scope. Datasets with highly accurate reference values include a small number of data points, and larger datasets provide less accurate data or only cover a narrow portion of the chemical space. The existing datasets are therefore insufficient for developing data-driven methods with predictive accuracy over a large chemical space. The Microsoft Research Accurate Chemistry Collection (MSR-ACC) will address this challenge. Here, it offers the MSR-ACC/TAE25 dataset of 76,879 total atomization energies obtained at the CCSD(T)/CBS level via the W1-F12 thermochemical protocol. The dataset is constructed to exhaustively cover chemical space for all elements up to argon by enumerating and sampling chemical graphs, thus avoiding bias towards any particular subspace of the chemical space (such as drug-like, organic, or experimentally observed molecules). With this first dataset in MSR-ACC, we enable data-driven approaches for developing predictive computational chemistry methods with unprecedented accuracy and scope.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.14492 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.14492 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.