reproducibilityindex.ai

Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework.
Researcher Affiliation	Academia	1Tokyo Institute of Technology 2The University of Tokyo 3RIKEN.
Pseudocode	Yes	Algorithm 1 illustrates the detailed steps of developing CI-ACL.
Open Source Code	Yes	Code is released on Github [https://github.com/CI-GCL.git].
Open Datasets	Yes	We use TU datasets (Morris et al., 2020) and OGB datasets (Hu et al., 2020a) to evaluate graph classification and regression, respectively. For pre-training, we use 2 million unlabeled molecules sampled from the ZINC15 database (Sterling & Irwin, 2015) for the chemistry domain, and 395K unlabeled protein ego-networks derived from PPI networks (Mayr et al., 2018) representing 50 species for the biology domain.
Dataset Splits	Yes	We adopt the provided data split for the OGB datasets and use 10-fold cross-validation for the TU datasets as it lacks such a split. ... We employ 10-fold cross-validation, with 80% of the data used for training, 10% for validation, and 10% for testing.
Hardware Specification	Yes	We conduct our experiments using a single machine equipped with an Intel i9-10850K processor, Nvidia Ge Force RTX 3090Ti (24GB) GPUs for the majority datasets. For the COLLAB, RDT-B, and RDT-M5K datasets, we utilized RTX A6000 GPUs (48GB) with batch sizes exceeding 512.
Software Dependencies	Yes	The code is written in Python 3.10 and we use Py Torch 2.1.0 on CUDA 11.8 to train the model on the GPU.
Experiment Setup	Yes	The embedding size was set to 256 for both TU and OGB datasets. We conduct training for 100 epochs with a batch size of 256, utilizing the Adam optimizer with a learning rate of 0.01. ... For each fold, 10% of each data is designed as labeled training data and 10% as labeled testing data.