Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning
Authors: Rose Gurung, Ronilo Ragodos, Chiyu Ma, Tong Wang, Chaofan Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the versatility of Proto Pair Net, we evaluate it on two distinct domains: age prediction (a supervised learning task) and car racing (a behavioral cloning task with a reinforcement learning expert). Experimental results show that Proto Pair Net achieves performance competitive with black-box baselines in both settings. Meanwhile, multiple user studies further highlight that Proto Pair Net improves interpretability over conventional prototype-based models relying on one-to-one prototype comparisons. Additionally, the illustrations of our reasoning processes and global analyses empirically demonstrate the consistency and faithfulness of the prototype representations. |
| Researcher Affiliation | Academia | Rose Gurung University of Maine EMAIL Ronilo Ragodos University of New Hampshire EMAIL Chiyu Ma Dartmouth College EMAIL Tong Wang Yale University EMAIL Chaofan Chen University of Maine EMAIL |
| Pseudocode | Yes | More details of the projection algorithm, including the pseudo-code, can be found in Appendix G. |
| Open Source Code | Yes | Our code is available at https: //github.com/Rose32/Proto Pair Net. |
| Open Datasets | Yes | In this case study, we evaluate Proto Pair Net on the UTKFace dataset [48] for age prediction, using only the age labels from 23,702 facial images. |
| Dataset Splits | Yes | Data splits are discussed in Section 3 of the main paper and Appendix H. |
| Hardware Specification | Yes | Each experiment was run on a single NVIDIA A100 80GB PCIe GPU with CUDA version 12.3, using 2 CPU cores and 64 GB of memory. |
| Software Dependencies | Yes | We implemented our models using Py Torch and conducted all experiments on a high-performance computing cluster using SLURM. Each experiment was run on a single NVIDIA A100 80GB PCIe GPU with CUDA version 12.3, using 2 CPU cores and 64 GB of memory. |
| Experiment Setup | Yes | This section documents the hyperparameters used to train our Proto Pair Net, as shown in Table 9. The hyperparameters shown in the table are consistent across all the architectures. The hyperparameters are chosen using grid search on a validation set for both the age prediction task and the car racing application. The full pipeline including prototype projection and fine-tuning took approximately 6 hours for age prediction (3 runs, batch size 256) and 3 hours for car racing (5 runs, batch size 128). |