Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs

Authors: Yi-Lun Liao, Tess Smidt

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Despite their widespread success in various domains, Transformer networks have yet to perform well across datasets in the domain of 3D atomistic graphs such as molecules even when 3D-related inductive biases like translational invariance and rotational equivariance are considered. In this paper, we demonstrate that Transformers can generalize well to 3D atomistic graphs and present Equiformer, a graph neural network leveraging the strength of Transformer architectures and incorporating SE(3)/E(3)-equivariant features based on irreducible representations (irreps). First, we propose a simple and effective architecture by only replacing original operations in Transformers with their equivariant counterparts and including tensor products. Using equivariant operations enables encoding equivariant information in channels of irreps features without complicating graph structures. With minimal modifications to Transformers, this architecture has already achieved strong empirical results. Second, we propose a novel attention mechanism called equivariant graph attention, which improves upon typical attention in Transformers through replacing dot product attention with multi-layer perceptron attention and including non-linear message passing. With these two innovations, Equiformer achieves competitive results to previous models on QM9, MD17 and OC20 datasets.
Researcher Affiliation Academia Yi-Lun Liao, Tess Smidt Massachusetts Institute of Technology {ylliao, tsmidt}@mit.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It uses diagrams like Figure 1 and 2 to illustrate architecture and operations.
Open Source Code Yes The code for reproducing the results of Equiformer on QM9, MD17 and OC20 datasets is available at https://github.com/atomicarchitects/equiformer.
Open Datasets Yes The QM9 dataset (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014) (CC BY-NC SA 4.0 license) consists of 134k small molecules... The MD17 dataset (Chmiela et al., 2017; Schütt et al., 2017; Chmiela et al., 2018) (CC BY-NC) consists of molecular dynamics simulations of small organic molecules... The Open Catalyst 2020 (OC20) dataset (Chanussot* et al., 2021) (Creative Commons Attribution 4.0 License) consists of larger atomic systems...
Dataset Splits Yes The data partition we use has 110k, 10k, and 11k molecules in training, validation and testing sets. (QM9) We use 950 and 50 different configurations for training and validation sets and the rest for the testing set. (MD17) There are 460k, 100k and 100k structures in training, validation, and testing sets, respectively. (OC20)
Hardware Specification Yes We use one A6000 GPU with 48GB to train each model and summarize the computational cost of training for one epoch as follows. (QM9) We use one A5000 GPU with 24GB to train different models for each molecule. (MD17) We use two A6000 GPUs, each with 48GB, to train models when IS2RS is not included during training. We use four A6000 GPUs to train Equiformer models when IS2RS node-level auxiliary task is adopted during training. (OC20)
Software Dependencies No The paper mentions "e3nn (Geiger et al., 2022)" and "PyTorch (Paszke et al., 2019)", but specific version numbers for these software dependencies are not provided in the main text.
Experiment Setup Yes Training Details. Please refer to Sec. D.1 in appendix for details on architecture, hyper-parameters and training time. (QM9) Table 8 summarizes the hyper-parameters for the QM9 dataset. Table 11 summarizes the hyper-parameters for the MD17 dataset. Table 14 summarizes the hyper-parameters for OC20 dataset under the setting of training without IS2RS auxiliary task.