A Self-Attention Ansatz for Ab-initio Quantum Chemistry
Authors: Ingrid von Glehn, James S Spencer, David Pfau
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we present an evaluation of the Psiformer on a wide variety of benchmark systems. and We test the Psiformer on a wide variety of benchmark systems for quantum chemistry and find that it is significantly more accurate than existing neural network Ansatzes of roughly the same size. |
| Researcher Affiliation | Industry | Ingrid von Glehn, James S. Spencer & David Pfau {ingridvg,jamessspencer,pfau}@deepmind.com |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | The code is available under the Apache License 2.0 as part of the Fermi Net repo at https://github.com/deepmind/ferminet. |
| Open Datasets | Yes | Geometries and CCSD(T)/CBS reference energies are taken from Pfau et al. (2020). and small molecules (4-30 electrons) from the G3 database (Curtiss et al., 2000). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits as it describes a first-principles computational chemistry approach rather than a typical supervised machine learning setup with fixed data splits. |
| Hardware Specification | Yes | All models were implemented in JAX (Bradbury et al., 2018) based upon the public Fermi Net (Spencer et al., 2020b) and KFAC implementations (Botev & Martens, 2022), and trained in parallel using between 16 and 64 A100 GPUs, depending on the system size. |
| Software Dependencies | No | The paper mentions software like JAX and KFAC implementations with citations, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Table 4 shows the default hyperparameters used for training all models implemented in this work. Note that Pfau et al. (2020) took the sum over gradients across the batch on each device and averaged over devices, whereas here the gradients are averaged over the entire batch. |