Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Parametric ρ-Norm Scaling Calibration
Authors: Siyuan Zhang, Linbo Xie
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods on multiple DNNs, including Res Net and VGG series. Our experiments are conducted on SVHN, CIFAR-10/100, 102 Flower, and Tiny-Image Net for post-hoc calibration performance. Different ablation experiments are designed to evaluate efficiency of the ρ-Norm Scaling calibration structure and the multi-level objective. In tables, the best results and relative improvements over 2nd best result in each section are in bold. Results are averaged over five runs with different seeds. Baselines: In experiments, we compare our methods with different calibration methods, such as non-parametric Hist. Binning, TS, Vector Scaling (Niculescu-Mizil and Caruana 2005). |
| Researcher Affiliation | Academia | Siyuan Zhang, Linbo Xie* School of Internet of Things Engineering, Jiangnan University EMAIL, xie EMAIL |
| Pseudocode | Yes | Algorithm 1: ρ-Norm Scaling Post-hoc Calibrator |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository or mention code in supplementary materials. |
| Open Datasets | Yes | Our experiments are conducted on SVHN, CIFAR-10/100, 102 Flower, and Tiny-Image Net for post-hoc calibration performance. |
| Dataset Splits | No | The paper mentions several datasets (SVHN, CIFAR-10/100, 102 Flower, Tiny-Image Net) but does not provide specific details about how these datasets were split into training, validation, or test sets, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In all experiments for CIFAR-10/100, the learning rate was set to 0.1, the momentum to 0.9, the weight clipping to Norm=3, and the batch size to 128. The learning rate decreased to 10% at 40% and 80% of the iterations. The weight decay was set to 10^-4 and the iteration number was 200. For the Tiny-Image Net, the learning rate was set to 0.01 and batch size was 64. The hyperparameter α is set to 1. |