EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations
Authors: Yi-Lun Liao, Brandon M Wood, Abhishek Das, Tess Smidt
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | ABSTRACT Equivariant Transformers such as Equiformer have demonstrated the efficacy of applying Transformers to the domain of 3D atomistic systems. However, they are limited to small degrees of equivariant representations due to their computational complexity. In this paper, we investigate whether these architectures can scale well to higher degrees. Starting from Equiformer, we first replace SOp3q convolutions with e SCN convolutions to efficiently incorporate higher-degree tensors. Then, to better leverage the power of higher degrees, we propose three architectural improvements attention re-normalization, separable S2 activation and separable layer normalization. Putting this all together, we propose Equiformer V2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to 9% on forces, 4% on energies, offers better speed-accuracy trade-offs, and 2ˆ reduction in DFT calculations needed for computing adsorption energies. Additionally, Equiformer V2 trained on only OC22 dataset outperforms Gem Net-OC trained on both OC20 and OC22 datasets, achieving much better data efficiency. Finally, we compare Equiformer V2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees. |
| Researcher Affiliation | Collaboration | Yi-Lun Liao1 Brandon Wood2 Abhishek Das2 Tess Smidt1 1Massachusetts Institute of Technology 2FAIR, Meta Equal contribution |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/atomicarchitects/equiformer_v2 |
| Open Datasets | Yes | Experiments on OC20 show that Equiformer V2 outperforms previous state-of-the-art methods on large-scale OC20 dataset (Chanussot* et al., 2021) by up to 9% on forces, 4% on energies, offers better speed-accuracy trade-offs, and 2ˆ reduction in DFT calculations needed for computing adsorption energies. Additionally, Equiformer V2 trained on only OC22 dataset outperforms Gem Net-OC trained on both OC20 and OC22 datasets, achieving much better data efficiency. Finally, we compare Equiformer V2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees. |
| Dataset Splits | Yes | All models are trained on the 2M split of the OC20 S2EF dataset, and errors are averaged over the four validation sub-splits. |
| Hardware Specification | Yes | Throughput is reported as the number of structures processed per GPU-second during training and measured on V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We summarize the hyper-parameters for the base model setting on OC20 S2EF-2M dataset and the main results on OC20 S2EF-All and S2EF-All+MD datasets in Table 7. |