GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction

Authors: Shuangli Li, Jingbo Zhou, Tong Xu, Dejing Dou, Hui Xiong4541-4549

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on seven real-life molecular datasets demonstrate the effectiveness of our proposed Geom GCL against state-of-the-art baselines.
Researcher Affiliation Collaboration Shuangli Li1, 2*, Jingbo Zhou2 , Tong Xu1, Dejing Dou2 , Hui Xiong3 1School of Computer Science and Technology, University of Science and Technology of China 2Business Intelligence Lab, Baidu Research 3Artificial Intelligence Thrust, The Hong Kong University of Science and Technology
Pseudocode No The paper describes the model framework and methods using text and mathematical equations, but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code is available here1. 1https://github.com/Paddle Paddle/Paddle Helix/tree/dev/ research/geomgcl
Open Datasets Yes To evaluate the performance of our proposed model with the existing molecular representation learning methods, we use seven molecular datasets from Molecule Net (Wu et al. 2018) including Clin Tox, Sider, Tox21 and Tox Cast four physiology datasets for graph classification tasks, as well as ESOL, Free Solv and Lipophilicity three physical chemistry datasets for graph regression tasks.
Dataset Splits Yes As recommended by the Molecule Net benchmarks (Wu et al. 2018), we randomly split each dataset into training, validation, and testing set with a ratio of 0.8/0.1/0.1.
Hardware Specification Yes We train all models on 24 Intel CPUs and Tesla K80 GPUs.
Software Dependencies No We implement our model based on deep learning platform Paddle Paddle. This does not include a specific version number. No other software components with version numbers are listed.
Experiment Setup Yes We use Adam optimizer for model training with a learning rate of 1e-3. We set the batch size as 256 for contrastive learning and 32 for finetuning with the scale parameter τ = 0.5. The hidden size of all models is set to 128. The cutoff distance dθ is determined (4 A or 5 A) according to the size of the molecule on each dataset. We set the dimension K of geometric embedding as 64. The numbers of 3D angle domains and global distance domains are set to 4. The balancing hyper-parameter λ is set to 0.01 according to the performance on validation set.