Molecular Contrastive Learning with Chemical Element Knowledge Graph

Authors: Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen3968-3976

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct extensive experiments to examine the proposed method by answering the following questions: Q1: How does KCL perform compared with state-of-the-art methods for molecular property prediction? Q2: Does the knowledge-guided graph augmentation in Module 1 learns better representations than general augmentations? Q3: How do knowledge feature initialization and graph encoders in Module 2 affect KCL? Q4: How useful are the self-supervised contrastive learning and hard negative strategy in Module 3? Q5: How can we interpret KCL(KMPNN) from a domainspecific perspective?
Researcher Affiliation Academia 1 College of Computer Science and Technology, Zhejiang University 2 ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University 3 AZFT Joint Lab for Knowledge Engine 4 School of Software Technology, Zhejiang University 5 Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University 6 Innovation Center in Zhejiang University, State Key Laboratory of Component-Based Chinese Medicine 7 Westlake Laboratory of Life Sciences and Biomedicine {fangyin, qiang.zhang.cs, haihong825, zhuangxiang, 231sm, wenzhang2015, qinandming, zhuo.chen, fanxh, huajunsir}@zju.edu.cn
Pseudocode Yes Algorithm 1 describes the KMPNN encoding process.
Open Source Code Yes Our codes and data are available at https://github.com/ZJU-Fangyin/KCL.
Open Datasets Yes We collect 250K unlabeled molecules sampled from the ZINC15 datasets (Sterling and Irwin 2015) to pre-train KCL. [...] We use 8 benchmark datasets from the Molecule Net (Wu et al. 2018a) to perform the experiments, which cover a wide range of molecular tasks such as quantum mechanics, physical chemistry, biophysics, and physiology.
Dataset Splits Yes For each dataset, as suggested by (Wu et al. 2018a), we apply three independent runs on three random-seeded random splitting or scaffold splitting with a ratio for train/validation/test as 8:1:1.
Hardware Specification Yes We develop all codes on a Ubuntu Server with 4 GPUs (NVIDIA Ge Force 1080Ti).
Software Dependencies No The paper mentions "Py Torch (Paszke et al. 2019) and Deep Graph Library (Wang et al. 2019)" but does not specify the version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes We use the Adam optimizer with an initial learning rate of 0.0001 and batch size of 256. For pre-training models, the running epoch is fixed to 20. The temperature τ is set as 0.1. For downstream tasks, we use early stopping on the validation set. We apply the random search to obtain the best hyper-parameters based on the validation set.