Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing

Authors: Viktor Zaverkin, Francesco Alesiani, Takashi Maruyama, Federico Errica, Henrik Christiansen, Makoto Takamoto, Nicolas Weber, Mathias Niepert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluations on various benchmark data sets, we consistently observe on-par or better performance than that of state-of-the-art spherical and Cartesian models. 5 Experiments and results
Researcher Affiliation Collaboration 1NEC Laboratories Europe 2NEC Italy 3University of Stuttgart
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but it does not include explicitly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code Yes The source code is available on Git Hub and can be accessed via this link: https://github.com/nec-research/ictp.
Open Datasets Yes All data sets used in this study are publicly available: r MD17 (https://doi.org/10.6084/ m9.figshare.12672038.v3), MD22 (http://www.sgdml.org), 3BPA (https://github. com/davkovacs/BOTNet-datasets), acetylacetone (https://github.com/davkovacs/ BOTNet-datasets), and Ta V Cr W (https://doi.org/10.18419/darus-3516).
Dataset Splits Yes with 50 additional configurations randomly sampled for early stopping. We randomly selected an additional set of 500 structures for each molecule in the data set for early stopping, while the remaining configurations were reserved for testing the final models. with further 50 used for early stopping. For the ICTP, MACE, and GM-NN models, we randomly selected a validation data set of 500 structures from the corresponding training data sets.
Hardware Specification Yes All ICTP and MACE models employed in this work were trained on a single NVIDIA A100 GPU with 80 GB of RAM.
Software Dependencies No The paper mentions software components like “AMSGrad variant of Adam” and “SiLU non-linearities” but does not provide specific version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes Unless stated otherwise, we used two message-passing layers and irreducible Cartesian tensors or spherical tensors of a maximal rank of lmax = 3 to embed the directional information of atomic distance vectors. For ICTP models with the full (ICTPfull) and symmetric (ICTPsym) product basis and MACE, we employ 256 uncoupled feature channels. Exceptions include our experiments with the 3BPA data set, aimed at investigating scaling and computational cost, and the Ta V Cr W experiments, where we used eight and 32 feature channels, respectively. For ICTP models with the symmetric product basis evaluated in the latent feature space (ICTPsym+lt), we use 64 coupled feature channels for the Cartesian product basis and 256 for two-body features. Radial features are derived from eight Bessel basis functions with polynomial envelope for the cutoff with p = 5 [60]. These features are fed into a fully connected NN of size [64, 64, 64]. We apply SiLU non-linearities to the outputs of the hidden layers [87, 88]. The readout function of the first message-passing layer is implemented as a linear layer. The readout function of the second layer is a single-layer fully connected NN with 16 hidden neurons. A cutoff radius of 5.0 Å is used across all data sets except MD22, where we used a cutoff radius of 5.5 Å for the double-walled nanotube and 6.0 Å for the other molecules in the data set. All models for r MD17, 3BPA, and acetylacetone were trained for 2000 epochs using the AMSGrad variant of Adam [89], with default parameters of β1 = 0.9, β2 = 0.999, and ε = 10^-8. For MD22 and Ta V Cr W, all models were trained for 1000 epochs. For r MD17, 3BPA, and acetylacetone data sets, we used a learning rate of 0.01 and a batch size of 5. For MD22 and Ta V Cr W, we again chose a learning rate of 0.01 but a mini-batch of 2 and 32, respectively.