Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery

Authors: Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Tianqianjin Lin, Changlong Sun, Xiaozhong Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 15 datasets validate DGPM s effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability.
Researcher Affiliation Collaboration 1Department of Information Resources Management, Zhejiang University, Hangzhou, 310058, China 2Alibaba Group, Hangzhou, 311121, China 3Northeastern University, Shenyang, 110819, China 4Computer Science Department, Worcester Polytechnic Institute, Worcester, 01609-2280, MA, USA
Pseudocode No The paper describes its methodology and components using text and equations (e.g., in the 'Methodology' section), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1The code is available at https://github.com/RocccYan/DGPM.
Open Datasets Yes To validate unsupervised representation learning, we conducted experiments on 7 graph classification benchmarks (Hou et al. 2022) from four distinct domains: MUTAG, IMDB-B, IMDB-M, PROTEINS, COLLAB, REDDIT-B, and NCI1. ... 250k unlabeled molecules sampled from the ZINC15 (Sterling and Irwin 2015) are used for pretraining and 8 molecular benchmark datasets (Wu et al. 2018) are used for finetuning and testing: BBBP, Tox21, Tox Cast, SIDER, Clin Tox, MUV, HIV, and BACE.
Dataset Splits No We followed the experimental setup employed in previous research work, such as data splits and evaluation metrics. ... for unsupervised representation learning task, we adopted the experimental setup from (Zhang et al. 2021a; Hou et al. 2022); for transfer learning task, we followed the setup established in (Hu et al. 2019; You et al. 2020, 2021). ... The downstream datasets are partitioned using scaffold-split to emulate real-world scenarios. ... We report the mean 10-fold crossvalidation accuracy with standard deviation after 5 runs.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., CPU or GPU models, memory, or cloud computing resources).
Software Dependencies No The paper states 'all implementations carried out using the Py Torch Geometric package' but does not specify version numbers for PyTorch Geometric or any other software dependencies.
Experiment Setup Yes The hidden dimension is set to 128 for both node and motif representations. The framework is trained using the Adam W optimizer for 100 epochs, with all implementations carried out using the Py Torch Geometric package.