Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning

Authors: Rose Gurung, Ronilo Ragodos, Chiyu Ma, Tong Wang, Chaofan Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the versatility of Proto Pair Net, we evaluate it on two distinct domains: age prediction (a supervised learning task) and car racing (a behavioral cloning task with a reinforcement learning expert). Experimental results show that Proto Pair Net achieves performance competitive with black-box baselines in both settings. Meanwhile, multiple user studies further highlight that Proto Pair Net improves interpretability over conventional prototype-based models relying on one-to-one prototype comparisons. Additionally, the illustrations of our reasoning processes and global analyses empirically demonstrate the consistency and faithfulness of the prototype representations.
Researcher Affiliation	Academia	Rose Gurung University of Maine EMAIL Ronilo Ragodos University of New Hampshire EMAIL Chiyu Ma Dartmouth College EMAIL Tong Wang Yale University EMAIL Chaofan Chen University of Maine EMAIL
Pseudocode	Yes	More details of the projection algorithm, including the pseudo-code, can be found in Appendix G.
Open Source Code	Yes	Our code is available at https: //github.com/Rose32/Proto Pair Net.
Open Datasets	Yes	In this case study, we evaluate Proto Pair Net on the UTKFace dataset [48] for age prediction, using only the age labels from 23,702 facial images.
Dataset Splits	Yes	Data splits are discussed in Section 3 of the main paper and Appendix H.
Hardware Specification	Yes	Each experiment was run on a single NVIDIA A100 80GB PCIe GPU with CUDA version 12.3, using 2 CPU cores and 64 GB of memory.
Software Dependencies	Yes	We implemented our models using Py Torch and conducted all experiments on a high-performance computing cluster using SLURM. Each experiment was run on a single NVIDIA A100 80GB PCIe GPU with CUDA version 12.3, using 2 CPU cores and 64 GB of memory.
Experiment Setup	Yes	This section documents the hyperparameters used to train our Proto Pair Net, as shown in Table 9. The hyperparameters shown in the table are consistent across all the architectures. The hyperparameters are chosen using grid search on a validation set for both the age prediction task and the car racing application. The full pipeline including prototype projection and fine-tuning took approximately 6 hours for age prediction (3 runs, batch size 256) and 3 hours for car racing (5 runs, batch size 128).