GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction
Authors: Shuangli Li, Jingbo Zhou, Tong Xu, Dejing Dou, Hui Xiong4541-4549
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on seven real-life molecular datasets demonstrate the effectiveness of our proposed Geom GCL against state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Shuangli Li1, 2*, Jingbo Zhou2 , Tong Xu1, Dejing Dou2 , Hui Xiong3 1School of Computer Science and Technology, University of Science and Technology of China 2Business Intelligence Lab, Baidu Research 3Artificial Intelligence Thrust, The Hong Kong University of Science and Technology |
| Pseudocode | No | The paper describes the model framework and methods using text and mathematical equations, but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | The code is available here1. 1https://github.com/Paddle Paddle/Paddle Helix/tree/dev/ research/geomgcl |
| Open Datasets | Yes | To evaluate the performance of our proposed model with the existing molecular representation learning methods, we use seven molecular datasets from Molecule Net (Wu et al. 2018) including Clin Tox, Sider, Tox21 and Tox Cast four physiology datasets for graph classification tasks, as well as ESOL, Free Solv and Lipophilicity three physical chemistry datasets for graph regression tasks. |
| Dataset Splits | Yes | As recommended by the Molecule Net benchmarks (Wu et al. 2018), we randomly split each dataset into training, validation, and testing set with a ratio of 0.8/0.1/0.1. |
| Hardware Specification | Yes | We train all models on 24 Intel CPUs and Tesla K80 GPUs. |
| Software Dependencies | No | We implement our model based on deep learning platform Paddle Paddle. This does not include a specific version number. No other software components with version numbers are listed. |
| Experiment Setup | Yes | We use Adam optimizer for model training with a learning rate of 1e-3. We set the batch size as 256 for contrastive learning and 32 for finetuning with the scale parameter τ = 0.5. The hidden size of all models is set to 128. The cutoff distance dθ is determined (4 A or 5 A) according to the size of the molecule on each dataset. We set the dimension K of geometric embedding as 64. The numbers of 3D angle domains and global distance domains are set to 4. The balancing hyper-parameter λ is set to 0.01 according to the performance on validation set. |