Molecular Contrastive Learning with Chemical Element Knowledge Graph
Authors: Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, Huajun Chen3968-3976
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments to examine the proposed method by answering the following questions: Q1: How does KCL perform compared with state-of-the-art methods for molecular property prediction? Q2: Does the knowledge-guided graph augmentation in Module 1 learns better representations than general augmentations? Q3: How do knowledge feature initialization and graph encoders in Module 2 affect KCL? Q4: How useful are the self-supervised contrastive learning and hard negative strategy in Module 3? Q5: How can we interpret KCL(KMPNN) from a domainspecific perspective? |
| Researcher Affiliation | Academia | 1 College of Computer Science and Technology, Zhejiang University 2 ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University 3 AZFT Joint Lab for Knowledge Engine 4 School of Software Technology, Zhejiang University 5 Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University 6 Innovation Center in Zhejiang University, State Key Laboratory of Component-Based Chinese Medicine 7 Westlake Laboratory of Life Sciences and Biomedicine {fangyin, qiang.zhang.cs, haihong825, zhuangxiang, 231sm, wenzhang2015, qinandming, zhuo.chen, fanxh, huajunsir}@zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 describes the KMPNN encoding process. |
| Open Source Code | Yes | Our codes and data are available at https://github.com/ZJU-Fangyin/KCL. |
| Open Datasets | Yes | We collect 250K unlabeled molecules sampled from the ZINC15 datasets (Sterling and Irwin 2015) to pre-train KCL. [...] We use 8 benchmark datasets from the Molecule Net (Wu et al. 2018a) to perform the experiments, which cover a wide range of molecular tasks such as quantum mechanics, physical chemistry, biophysics, and physiology. |
| Dataset Splits | Yes | For each dataset, as suggested by (Wu et al. 2018a), we apply three independent runs on three random-seeded random splitting or scaffold splitting with a ratio for train/validation/test as 8:1:1. |
| Hardware Specification | Yes | We develop all codes on a Ubuntu Server with 4 GPUs (NVIDIA Ge Force 1080Ti). |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al. 2019) and Deep Graph Library (Wang et al. 2019)" but does not specify the version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We use the Adam optimizer with an initial learning rate of 0.0001 and batch size of 256. For pre-training models, the running epoch is fixed to 20. The temperature τ is set as 0.1. For downstream tasks, we use early stopping on the validation set. We apply the random search to obtain the best hyper-parameters based on the validation set. |