Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models
Authors: Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments on six datasets with different tasks demonstrate the superiority of Graph SAM, especially in optimizing the model update process. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, China 2Institute for Medical Engineering & Science, Massachusetts Institute of Technology, USA 3School of Computing, University of Georgia, USA 4College of Computer Science and Technology, Jilin University, China |
| Pseudocode | Yes | Algorithm 1 Graph SAM |
| Open Source Code | Yes | The code is in: https://github.com/YL-wang/Graph SAM/tree/graphsam. |
| Open Datasets | Yes | We consider six public benchmark datasets: BBBP, Tox21, Sider, and Clin Tox for the classification task, and ESOL and Lipophilicity for the regression task. We evaluate all models on a random split as suggested by Molecule Net (Wu et al., 2018). |
| Dataset Splits | Yes | We evaluate all models on a random split as suggested by Molecule Net (Wu et al., 2018), and split the datasets into training, validation, and testing with a 0.8/0.1/0.1 ratio. |
| Hardware Specification | Yes | All the experiments are implemented by Py Torch, and run on an NVIDIA TITAN-RTX (24G) GPU. |
| Software Dependencies | No | The paper mentions 'All the experiments are implemented by Py Torch' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We only adjust the specific hyperparameters introduced by Graph SAM: (1) smoothing parameters of moving average β is tuned within {0.9, 0.99, 0.999}, (2) the initial size of the gradient ball ρ is selected from {0.05, 0.01, 0.005, 0.001}, (3) the ρ s update rate λ is searched over {1, 3, 5}, (4) and the scheduler s modification scale γ = {0.5, 0.2, 0.1}. Due to space limitations, we place our experiments on hyperparameters in Appendix A.6. All the experiments are implemented by Py Torch, and run on an NVIDIA TITAN-RTX (24G) GPU. |