GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Authors: Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, Liwei Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted to demonstrate the power of Geo MFormer. All empirical results show that Geo MFormer achieves strong performance on both invariant and equivariant tasks of different types and scales. In this section, we empirically investigate our Geo MFormer on extensive tasks. |
| Researcher Affiliation | Collaboration | 1School of EECS, Peking University 2National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 3Microsoft Research AI4Science 4Center for Machine Learning Research, Peking University. |
| Pseudocode | No | The paper provides detailed mathematical formulations and architectural descriptions of Geo MFormer, including equations for self-attention, cross-attention, and FFN modules, but it does not include a distinct 'Algorithm' or 'Pseudocode' block or figure. |
| Open Source Code | Yes | Code and models will be made publicly available at https: //github.com/c-tl/Geo MFormer. |
| Open Datasets | Yes | On the Open Catalyst 2020 (OC20) dataset (Chanussot et al., 2021)..., PCQM4Mv2 is one of the largest quantum chemical property datasets from the OGB Large-Scale Challenge (Hu et al., 2021)., Molecule3D (Xu et al., 2021) is a newly proposed large-scale dataset..., N-Body Simulation (Satorras et al., 2021), MD17 (Chmiela et al., 2017). |
| Dataset Splits | Yes | The training set for both tasks is composed of over 460,328 catalyst-adsorbate complexes. To better evaluate the model s performance, the validation and test sets consider the in-distribution (ID) and out-of-distribution settings which uses unseen adsorbates (OOD-Ads), catalysts (OOD-Cat) or both (OOD-Both), containing approximately 200,000 complexes in total. (OC20), The dataset contains 3,899,647 molecules in total and is split into training, validation, and test sets with the splitting ratio 6 : 2 : 2. (Molecule3D), The dataset contains 3.000 trajectories for training, 2.000 trajectories for validation, and 2.000 trajectories for testing. (N-body Simulation), all models are trained on only 1,000 samples from which 50 are used for validation. (MD17). |
| Hardware Specification | Yes | The model is trained on 16 NVIDIA Tesla V100 GPUs. (OC20, PCQM4Mv2), The model is trained on 1 NVIDIA V100 GPUs. (N-body Simulation). |
| Software Dependencies | No | The paper mentions software components like 'Adam W as the optimizer', 'GELU activation', 'Si LU activation', 'Layer Normalization (LN)', and 'RDKit', but it does not specify exact version numbers for these software packages or libraries. |
| Experiment Setup | Yes | Our Geo MFormer model consists of 12 layers. The dimension of hidden layers and feed-forward layers is set to 768. The number of attention heads is set to 48. The number of Gaussian Basis kernels is set to 128. We use Adam W as the optimizer and set the hyper-parameter ϵ to 1e-6 and (β1, β2) to (0.9,0.98). The gradient clip norm is set to 5.0. The peak learning rate is set to 2e-4. The batch size is set to 128. The dropout ratios for the input embeddings, attention matrices, and hidden representations are set to 0.0, 0.1, and 0.0 respectively. The weight decay is set to 0.0. The model is trained for 1 million steps with a 60k-step warm-up stage. |