Transfer learning for atomistic simulations using GNNs and kernel mean embeddings

Authors: John Falk, Luigi Bonati, Pietro Novelli, Michele Parrinello, Massimiliano Pontil

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches.
Researcher Affiliation Collaboration John I. Falk CSML Istituto Italiano di Tecnologia Genova, Italy me@isakfalk.com Luigi Bonati Atomistic Simulations Istituto Italiano di Tecnologia Genova, Italy luigi.bonati@iit.it Pietro Novelli CSML Istituto Italiano di Tecnologia Genova, Italy pietro.novelli@iit.it Michele Parrinello Atomistic Simulations Istituto Italiano di Tecnologia Genova, Italy michele.parrinello@iit.it Massimiliano Pontil CSML Istituto Italiano di Tecnologia Genova, Italy University College London, U.K. massimiliano.pontil@iit.it
Pseudocode Yes In Algorithm 1 we report the pseudo-code describing our implementation of the training and prediction steps of MEKRR.
Open Source Code Yes We make the code repository available at https://github.com/IsakFalk/atomistic_transfer_mekrr.
Open Datasets Yes OC20 The Open Catalyst (OC) 20 is a large dataset of ab initio calculations aimed at estimating adsorption energies on catalytic surfaces. It comprises 250 millions of DFT calculations, generated from over 1.2 million relaxations trajectories of different combinations of molecules and surfaces.
Dataset Splits Yes We split all the below datasets into a train, validation, and test set using random splitting of 60/20/20.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions software used, such as Sch Net and SCN codebase from [22], and QUIP/quippy code base [61, 62] for GAP, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The length-scale of the Gaussian kernel is chosen according to the median heuristic [63]. We will denote MEKRR-(Sch Net) and MEKRR-(SCN) the variants using Schnet and SCN node features as inputs, respectively. [...] To initially fit the regularization parameter λ we set α = 0 and cross-validate λ {10 3, . . . , 10 9} using the same datasets.