Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Community-Invariant Graph Contrastive Learning
Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. |
| Researcher Affiliation | Academia | 1Tokyo Institute of Technology 2The University of Tokyo 3RIKEN. |
| Pseudocode | Yes | Algorithm 1 illustrates the detailed steps of developing CI-ACL. |
| Open Source Code | Yes | Code is released on Github [https://github.com/CI-GCL.git]. |
| Open Datasets | Yes | We use TU datasets (Morris et al., 2020) and OGB datasets (Hu et al., 2020a) to evaluate graph classification and regression, respectively. For pre-training, we use 2 million unlabeled molecules sampled from the ZINC15 database (Sterling & Irwin, 2015) for the chemistry domain, and 395K unlabeled protein ego-networks derived from PPI networks (Mayr et al., 2018) representing 50 species for the biology domain. |
| Dataset Splits | Yes | We adopt the provided data split for the OGB datasets and use 10-fold cross-validation for the TU datasets as it lacks such a split. ... We employ 10-fold cross-validation, with 80% of the data used for training, 10% for validation, and 10% for testing. |
| Hardware Specification | Yes | We conduct our experiments using a single machine equipped with an Intel i9-10850K processor, Nvidia Ge Force RTX 3090Ti (24GB) GPUs for the majority datasets. For the COLLAB, RDT-B, and RDT-M5K datasets, we utilized RTX A6000 GPUs (48GB) with batch sizes exceeding 512. |
| Software Dependencies | Yes | The code is written in Python 3.10 and we use Py Torch 2.1.0 on CUDA 11.8 to train the model on the GPU. |
| Experiment Setup | Yes | The embedding size was set to 256 for both TU and OGB datasets. We conduct training for 100 epochs with a batch size of 256, utilizing the Adam optimizer with a learning rate of 0.01. ... For each fold, 10% of each data is designed as labeled training data and 10% as labeled testing data. |