Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Provable Training for Graph Contrastive Learning

Authors: Yue Yu, Xiao Wang, Mengmei Zhang, Nian Liu, Chuan Shi

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.
Researcher Affiliation	Academia	Yue Yu1, Xiao Wang2 , Mengmei Zhang1, Nian Liu1, Chuan Shi1 1Beijing University of Posts and Telecommunications, China 2 Beihang University, China
Pseudocode	Yes	Algorithm 1: Provable Training for GCL
Open Source Code	Yes	The complete implementation can be found at https://github.com/Void Haruhi/POT-GCL. We also provide an implementation based on Gamma GL [12] at https://github.com/BUPT-GAMMA/Gamma GL.
Open Datasets	Yes	We obtain the datasets from Py G [3]. Although the datasets are available for public use, we cannot find their licenses. The datasets can be found in the URLs below: Cora, Cite Seer, Pub Med: https://github.com/kimiyoung/planetoid/raw/master/data Blog Catalog: https://docs.google.com/uc?export=download&id=178PqGqh67RUYMMP6SoRHDoIBh8ku5FS&confirm=t Flickr: https://docs.google.com/uc?export=download&id=1tZp3EB20fAC27SYWwax66_8uGsuU62X&confirm=t Computers, Photo: https://github.com/shchur/gnn-benchmark/raw/master/data/npz/ Wiki CS: https://github.com/pmernyei/wiki-cs-dataset/raw/master/dataset
Dataset Splits	Yes	For datasets with a public split available [28], including Cora, Cite Seer, and Pub Med, we follow the public split; For other datasets with no public split, we generate random splits, where each of the training set and validation set contains 10% nodes of the graph and the rest 80% nodes of the graph is used for testing.
Hardware Specification	Yes	OS: Linux 5.4.0-131-generic CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz GPU: Ge Force RTX 3090
Software Dependencies	No	The paper mentions implementing with 'Py Torch' and using 'Py G' for datasets, but does not specify version numbers for these software components.
Experiment Setup	Yes	Table 4: Hyperparameters: (p1 e, p2 e) Models Cora Cite Seer Pub Med Flickr Blog Catalog Computers Photo Wiki CS... Table 5: Hyperparameters: (τ, κ) Models Cora Cite Seer Pub Med Flickr Blog Catalog Computers Photo Wiki CS... Table 6: Hyperparameters: (pot_batch, num_epochs) Models Cora Cite Seer Pub Med Flickr Blog Catalog Computers Photo Wiki CS