Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework.
Researcher Affiliation Academia 1Tokyo Institute of Technology 2The University of Tokyo 3RIKEN.
Pseudocode Yes Algorithm 1 illustrates the detailed steps of developing CI-ACL.
Open Source Code Yes Code is released on Github [https://github.com/CI-GCL.git].
Open Datasets Yes We use TU datasets (Morris et al., 2020) and OGB datasets (Hu et al., 2020a) to evaluate graph classification and regression, respectively. For pre-training, we use 2 million unlabeled molecules sampled from the ZINC15 database (Sterling & Irwin, 2015) for the chemistry domain, and 395K unlabeled protein ego-networks derived from PPI networks (Mayr et al., 2018) representing 50 species for the biology domain.
Dataset Splits Yes We adopt the provided data split for the OGB datasets and use 10-fold cross-validation for the TU datasets as it lacks such a split. ... We employ 10-fold cross-validation, with 80% of the data used for training, 10% for validation, and 10% for testing.
Hardware Specification Yes We conduct our experiments using a single machine equipped with an Intel i9-10850K processor, Nvidia Ge Force RTX 3090Ti (24GB) GPUs for the majority datasets. For the COLLAB, RDT-B, and RDT-M5K datasets, we utilized RTX A6000 GPUs (48GB) with batch sizes exceeding 512.
Software Dependencies Yes The code is written in Python 3.10 and we use Py Torch 2.1.0 on CUDA 11.8 to train the model on the GPU.
Experiment Setup Yes The embedding size was set to 256 for both TU and OGB datasets. We conduct training for 100 epochs with a batch size of 256, utilizing the Adam optimizer with a learning rate of 0.01. ... For each fold, 10% of each data is designed as labeled training data and 10% as labeled testing data.