Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CaliGCL: Calibrated Graph Contrastive Learning via Partitioned Similarity and Consistency Discrimination

Authors: Yuena Lin, Hao Wei, Hai-Chun Cai, Bohang Sun, Tao Yang, Zhen Yang, Gengyu Lyu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple benchmarks show that Cali GCL effectively mitigates both types of biases and achieves state-of-the-art performance. (Abstract) In this section, we evaluate our proposed model across multiple widely used graph datasets, including citation networks (Cora, Citeseer, Pubmed, and DBLP), co-purchase networks (Amazon Photo, and Amazon Computers), social networks (COLLAB, REDDIT-BINARY, REDDIT-MULTI-5K, IMDB-BINARY), and biochemical networks (NCI1, PROTEINS, DD, and MUTAG). The detailed dataset information and hyperparameter settings for these datasets are provided in the Appendix B. In the comparative experiments, the node classification and graph classification are employed to evaluate the model expressiveness. For fair comparison, we compare our model with recent state-of-the-art self-supervised graph models for each task. All the experiments are implemented in Pytorch and conducted on a server with RTX 3090 (24 GB).
Researcher Affiliation	Collaboration	1 College of Computer Science, Beijing University of Technology, Beijing 2 College of Computer and Data Science, Fuzhou University, Fuzhou 3 Idealism Beijing Technology Co., Ltd., Beijing
Pseudocode	No	The paper describes the model architecture and training strategy in detail, but it does not contain any explicitly labeled pseudocode or algorithm blocks. The procedures are described in narrative text.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have provided the codes and example dataset in supplemental material.
Open Datasets	Yes	In this section, we evaluate our proposed model across multiple widely used graph datasets, including citation networks (Cora, Citeseer, Pubmed, and DBLP), co-purchase networks (Amazon Photo, and Amazon Computers), social networks (COLLAB, REDDIT-BINARY, REDDIT-MULTI-5K, IMDB-BINARY), and biochemical networks (NCI1, PROTEINS, DD, and MUTAG). The detailed dataset information and hyperparameter settings for these datasets are provided in the Appendix B. In the comparative experiments, the node classification and graph classification are employed to evaluate the model expressiveness.
Dataset Splits	Yes	In our experiment setting, we split 10% representations for training the classifier, 10% for validation, and the remainder for testing. (Section 4.1) For the experiment setting, we first train a GIN encoder and then train an SVM classifier to classify the produced graph representations via 10-fold cross-validation. (Section 4.2)
Hardware Specification	Yes	All the experiments are implemented in Pytorch and conducted on a server with RTX 3090 (24 GB).
Software Dependencies	No	All the experiments are implemented in Pytorch and conducted on a server with RTX 3090 (24 GB). (Section 4) The paper mentions 'Pytorch' as the implementation framework but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup	Yes	The hyperparameter settings for node and graph classification are listed in Table 5 and 6, respectively. For the hyperparameters used in the node-level datasets, Lr_Enc is the learning rate of the graph encoder in the pre-training phase, Epc_Init_Enc is the training epochs of the graph encoder in the pre-training phase, and Hid_dim is the hidden dimension of the graph encoder. Similarly, Lr_Dis, Epc_Init_Dis, Dis_dim, Dis_act are the learning rate, training epochs, hidden dimension, and activation function for the discriminator in the pre-training phase, respectively. Leaky represents Leaky ReLU activation function. Itr_num is the number of iterations in the alternating training strategy, where the graph encoder is trained with Epc_FT_Enc epochs in each iteration and the discriminator is trained with Epc_FT_Dis epochs in each iteration. Proj_dim is the hidden dimension of the projector head, t is the dimension of the structure embedding, m is the number of feature elements in a partition, η is the threshold to distinguish the positive pairs and negative pairs for the discriminator, and τ is the temperature coefficient. pe1, pf1, pe2, and pf2 are four hyperparameters for controlling the strength of graph augmentations.