Motif-based Graph Self-Supervised Learning for Molecular Property Prediction

Authors: ZAIXI ZHANG, Qi Liu, Hao Wang, Chengqiang Lu, Chee-Kong Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various downstream benchmark tasks show that our methods outperform all state-of-the-art baselines.
Researcher Affiliation Collaboration Zaixi Zhang1, Qi Liu1 , Hao Wang1, Chengqiang Lu1, Chee-Kong Lee2 1: Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China 2: Tencent America
Pseudocode Yes The pseudo codes of the training process is included in the Appendix.
Open Source Code Yes The implementation is publicly available at https://github.com/zaixizhang/MGSSL.
Open Datasets Yes we use 250k unlabeled molecules sampled from the ZINC15 database [38] for self-supervised pre-training tasks. As for the downstream finetune tasks, we consider 8 binary classification benchmark datasets contained in Molecule Net [45].
Dataset Splits Yes The split for train/validation/test sets is 80% : 10% : 10%.
Hardware Specification Yes All experiments are conducted on Tesla V100 GPUs.
Software Dependencies No The paper mentions using "the open-source package RDKit [22]" but does not specify a version number for it or any other software dependencies.
Experiment Setup Yes In the process of pre-training, GNNs are pre-trained for 100 epochs with Adam optimizer and learning rate 0.001. In the finetuning stage, we train for 100 epochs and report the testing score with the best cross-validation performance. The hidden dimension is set to 300 and the batch size is set to 32 for pre-training and finetuning.