Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models

Authors: Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive experiments on six datasets with different tasks demonstrate the superiority of Graph SAM, especially in optimizing the model update process.
Researcher Affiliation Academia 1School of Artificial Intelligence, Jilin University, China 2Institute for Medical Engineering & Science, Massachusetts Institute of Technology, USA 3School of Computing, University of Georgia, USA 4College of Computer Science and Technology, Jilin University, China
Pseudocode Yes Algorithm 1 Graph SAM
Open Source Code Yes The code is in: https://github.com/YL-wang/Graph SAM/tree/graphsam.
Open Datasets Yes We consider six public benchmark datasets: BBBP, Tox21, Sider, and Clin Tox for the classification task, and ESOL and Lipophilicity for the regression task. We evaluate all models on a random split as suggested by Molecule Net (Wu et al., 2018).
Dataset Splits Yes We evaluate all models on a random split as suggested by Molecule Net (Wu et al., 2018), and split the datasets into training, validation, and testing with a 0.8/0.1/0.1 ratio.
Hardware Specification Yes All the experiments are implemented by Py Torch, and run on an NVIDIA TITAN-RTX (24G) GPU.
Software Dependencies No The paper mentions 'All the experiments are implemented by Py Torch' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes We only adjust the specific hyperparameters introduced by Graph SAM: (1) smoothing parameters of moving average β is tuned within {0.9, 0.99, 0.999}, (2) the initial size of the gradient ball ρ is selected from {0.05, 0.01, 0.005, 0.001}, (3) the ρ s update rate λ is searched over {1, 3, 5}, (4) and the scheduler s modification scale γ = {0.5, 0.2, 0.1}. Due to space limitations, we place our experiments on hyperparameters in Appendix A.6. All the experiments are implemented by Py Torch, and run on an NVIDIA TITAN-RTX (24G) GPU.