Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Parametric ρ-Norm Scaling Calibration

Authors: Siyuan Zhang, Linbo Xie

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods on multiple DNNs, including Res Net and VGG series. Our experiments are conducted on SVHN, CIFAR-10/100, 102 Flower, and Tiny-Image Net for post-hoc calibration performance. Different ablation experiments are designed to evaluate efficiency of the ρ-Norm Scaling calibration structure and the multi-level objective. In tables, the best results and relative improvements over 2nd best result in each section are in bold. Results are averaged over five runs with different seeds. Baselines: In experiments, we compare our methods with different calibration methods, such as non-parametric Hist. Binning, TS, Vector Scaling (Niculescu-Mizil and Caruana 2005).
Researcher Affiliation	Academia	Siyuan Zhang, Linbo Xie* School of Internet of Things Engineering, Jiangnan University EMAIL, xie EMAIL
Pseudocode	Yes	Algorithm 1: ρ-Norm Scaling Post-hoc Calibrator
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository or mention code in supplementary materials.
Open Datasets	Yes	Our experiments are conducted on SVHN, CIFAR-10/100, 102 Flower, and Tiny-Image Net for post-hoc calibration performance.
Dataset Splits	No	The paper mentions several datasets (SVHN, CIFAR-10/100, 102 Flower, Tiny-Image Net) but does not provide specific details about how these datasets were split into training, validation, or test sets, such as percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In all experiments for CIFAR-10/100, the learning rate was set to 0.1, the momentum to 0.9, the weight clipping to Norm=3, and the batch size to 128. The learning rate decreased to 10% at 40% and 80% of the iterations. The weight decay was set to 10^-4 and the iteration number was 200. For the Tiny-Image Net, the learning rate was set to 0.01 and batch size was 64. The hyperparameter α is set to 1.