Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Graph Your Own Prompt

Authors: Xi Ding, Lei Wang, Piotr Koniusz, Yongsheng Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate GCR across diverse architectures and benchmarks, achieving consistent accuracy gains and stronger generalization without altering backbone design or training protocol. Below, we review related work. ... We evaluate GCR on several benchmark datasets, including Kaggle cats vs. dogs[22], CIFAR-10[62], CIFAR-100[62], Tiny Image Net[63], and Image Net-1K [23]. Our evaluation spans a diverse range of architectures...
Researcher Affiliation	Collaboration	1Griffith University, 2Data61/CSIRO, 3Australian National University, 4University of New South Wales
Pseudocode	No	The paper describes the methodology using mathematical formulations and textual descriptions in sections such as "3.1 Graph Consistency Layer" and "3.2 Graph Consistency Regularization". There are no explicitly labeled pseudocode or algorithm blocks, figures, or sections.
Open Source Code	Yes	GCR is model-agnostic, lightweight, and improves semantic structure across various networks and datasets. Experiments show that GCR promotes cleaner feature structure, stronger intra-class cohesion, and improved generalization, offering a new perspective on learning from prediction structure. [Project website] [Code] ... Additionally, we will release our code and pretrained models to support future research and facilitate further exploration of our approach.
Open Datasets	Yes	We evaluate GCR on several benchmark datasets, including Kaggle cats vs. dogs[22], CIFAR-10[62], CIFAR-100[62], Tiny Image Net[63], and Image Net-1K [23].
Dataset Splits	Yes	We evaluate GCR on several benchmark datasets, including Kaggle cats vs. dogs[22], CIFAR-10[62], CIFAR-100[62], Tiny Image Net[63], and Image Net-1K [23]. Our evaluation spans a diverse range of architectures, from lightweight models (e.g., Mobile Net[46], Shuffle Net[124], and Squeeze Net[51]), to deeper CNNs (e.g., Goog Le Net[97], Res Net[42], Dense Net[48], Res Ne Xt[112], Stochastic Res Net[49], and SE-Res Net[47]). We also include transformer-based architectures such as Vi T[31], Swin[70], Mobile Vi T[77], CEi T[120]), i Former [95], Vi G [37], as well as Masked Autoencoders (MAE) [41]. ... We report the mean and standard deviation over 10 runs for CIFAR-10, CIFAR-100, and Tiny Image Net, and over 3 runs for Image Net-1K.
Hardware Specification	Yes	Experiments run on NVIDIA V100 GPUs with 12 CPUs and 48 GB RAM. ... All experiments are conducted on NVIDIA V100 GPUs (32GB) paired with 12 CPU cores and 48GB of system RAM.
Software Dependencies	No	Libraries such as Py Torch exploit thread and GPU-level parallelism to accelerate operations like torch.bmm, functional.cosine_similarity, and torch.triu.
Experiment Setup	Yes	For CNNs, we follow [25]: 200 epochs, initial learning rate 0.1 (decayed at epochs 60/120/160), batch size 128, weight decay 5 10 4, and momentum 0.9. For transformers, we use Adam W with a learning rate of 1 10 4, cosine annealing, weight decay 5 10 2, batch size 256, AMP[30], 10-epoch warm-up, and gradient clipping (norm 1.0).