Transfer learning for atomistic simulations using GNNs and kernel mean embeddings
Authors: John Falk, Luigi Bonati, Pietro Novelli, Michele Parrinello, Massimiliano Pontil
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our approach on a series of realistic datasets of increasing complexity, showing excellent generalization and transferability performance, and improving on methods that rely on GNNs or ridge regression alone, as well as similar fine-tuning approaches. |
| Researcher Affiliation | Collaboration | John I. Falk CSML Istituto Italiano di Tecnologia Genova, Italy me@isakfalk.com Luigi Bonati Atomistic Simulations Istituto Italiano di Tecnologia Genova, Italy luigi.bonati@iit.it Pietro Novelli CSML Istituto Italiano di Tecnologia Genova, Italy pietro.novelli@iit.it Michele Parrinello Atomistic Simulations Istituto Italiano di Tecnologia Genova, Italy michele.parrinello@iit.it Massimiliano Pontil CSML Istituto Italiano di Tecnologia Genova, Italy University College London, U.K. massimiliano.pontil@iit.it |
| Pseudocode | Yes | In Algorithm 1 we report the pseudo-code describing our implementation of the training and prediction steps of MEKRR. |
| Open Source Code | Yes | We make the code repository available at https://github.com/IsakFalk/atomistic_transfer_mekrr. |
| Open Datasets | Yes | OC20 The Open Catalyst (OC) 20 is a large dataset of ab initio calculations aimed at estimating adsorption energies on catalytic surfaces. It comprises 250 millions of DFT calculations, generated from over 1.2 million relaxations trajectories of different combinations of molecules and surfaces. |
| Dataset Splits | Yes | We split all the below datasets into a train, validation, and test set using random splitting of 60/20/20. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions software used, such as Sch Net and SCN codebase from [22], and QUIP/quippy code base [61, 62] for GAP, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The length-scale of the Gaussian kernel is chosen according to the median heuristic [63]. We will denote MEKRR-(Sch Net) and MEKRR-(SCN) the variants using Schnet and SCN node features as inputs, respectively. [...] To initially fit the regularization parameter λ we set α = 0 and cross-validate λ {10 3, . . . , 10 9} using the same datasets. |