Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Uncertainty Estimation for Molecules: Desiderata and Methods
Authors: Tom Wollschläger, Nicholas Gao, Bertrand Charpentier, Mohamed Amine Ketata, Stephan Günnemann
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our extensive experimental evaluation, we test four different UE with three different backbones and two datasets. In out-of-equilibrium detection, we find LNK yielding up to 2.5 and 2.1 times lower errors in terms of AUC-ROC score than dropout or evidential regression-based methods while maintaing high predictive performance. |
| Researcher Affiliation | Academia | Tom Wollschl ager 1 Nicholas Gao 1 Bertrand Charpentier 1 Mohamed Amine Ketata 1 Stephan G unnemann 1 1Department of Computer Science & Munich Data Science Institute, Technical University of Munich, Germany. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. Methods are described textually. |
| Open Source Code | Yes | Find our code at cs.cit.tum.de/daml/uncertainty-for-molecules |
| Open Datasets | Yes | Datasets. QM7-X: (Hoja et al., 2021) This dataset covers both equilibrium and non-equilibrium structures. We train on equilibrium structures and non-equilibrium structures are considered OOD data. MD17: (Chmiela et al., 2017) MD17 contains energies and forces for molecular dynamics trajectories of different organic molecules. |
| Dataset Splits | Yes | Table 10. Hyperparameters of the datasets used with all models: val set size 4151 1000 |
| Hardware Specification | No | The paper mentions evaluating runtime on QM7X but does not provide specific details about the hardware used, such as GPU or CPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions using PyTorch-Geometric in the footnote of Table 11 for Sch Net, but it does not specify version numbers for PyTorch-Geometric, PyTorch, CUDA, or other key software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | Table 8 and Table 9 provide specific hyperparameters and settings used for training models on QM7-X and MD17 datasets, including learning rate, patience, force weighting factor, number of inducing points, warmup steps, decay steps, decay rate, EMA decay, and dropout locations. |