Motif-based Graph Self-Supervised Learning for Molecular Property Prediction
Authors: ZAIXI ZHANG, Qi Liu, Hao Wang, Chengqiang Lu, Chee-Kong Lee
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various downstream benchmark tasks show that our methods outperform all state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Zaixi Zhang1, Qi Liu1 , Hao Wang1, Chengqiang Lu1, Chee-Kong Lee2 1: Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China 2: Tencent America |
| Pseudocode | Yes | The pseudo codes of the training process is included in the Appendix. |
| Open Source Code | Yes | The implementation is publicly available at https://github.com/zaixizhang/MGSSL. |
| Open Datasets | Yes | we use 250k unlabeled molecules sampled from the ZINC15 database [38] for self-supervised pre-training tasks. As for the downstream finetune tasks, we consider 8 binary classification benchmark datasets contained in Molecule Net [45]. |
| Dataset Splits | Yes | The split for train/validation/test sets is 80% : 10% : 10%. |
| Hardware Specification | Yes | All experiments are conducted on Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using "the open-source package RDKit [22]" but does not specify a version number for it or any other software dependencies. |
| Experiment Setup | Yes | In the process of pre-training, GNNs are pre-trained for 100 epochs with Adam optimizer and learning rate 0.001. In the finetuning stage, we train for 100 epochs and report the testing score with the best cross-validation performance. The hidden dimension is set to 300 and the batch size is set to 32 for pre-training and finetuning. |