Community-Invariant Graph Contrastive Learning
Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. |
| Researcher Affiliation | Academia | 1Tokyo Institute of Technology 2The University of Tokyo 3RIKEN. |
| Pseudocode | Yes | Algorithm 1 illustrates the detailed steps of developing CI-ACL. |
| Open Source Code | Yes | Code is released on Github [https://github.com/CI-GCL.git]. |
| Open Datasets | Yes | We use TU datasets (Morris et al., 2020) and OGB datasets (Hu et al., 2020a) to evaluate graph classification and regression, respectively. For pre-training, we use 2 million unlabeled molecules sampled from the ZINC15 database (Sterling & Irwin, 2015) for the chemistry domain, and 395K unlabeled protein ego-networks derived from PPI networks (Mayr et al., 2018) representing 50 species for the biology domain. |
| Dataset Splits | Yes | We adopt the provided data split for the OGB datasets and use 10-fold cross-validation for the TU datasets as it lacks such a split. ... We employ 10-fold cross-validation, with 80% of the data used for training, 10% for validation, and 10% for testing. |
| Hardware Specification | Yes | We conduct our experiments using a single machine equipped with an Intel i9-10850K processor, Nvidia Ge Force RTX 3090Ti (24GB) GPUs for the majority datasets. For the COLLAB, RDT-B, and RDT-M5K datasets, we utilized RTX A6000 GPUs (48GB) with batch sizes exceeding 512. |
| Software Dependencies | Yes | The code is written in Python 3.10 and we use Py Torch 2.1.0 on CUDA 11.8 to train the model on the GPU. |
| Experiment Setup | Yes | The embedding size was set to 256 for both TU and OGB datasets. We conduct training for 100 epochs with a batch size of 256, utilizing the Adam optimizer with a learning rate of 0.01. ... For each fold, 10% of each data is designed as labeled training data and 10% as labeled testing data. |