Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Limitations of Temperature Scaling for Distributions with Overlaps
Authors: Muthu Chidambaram, Rong Ge
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, in Section 5 we show that our theoretical results accurately reflect practice by considering both synthetic data and image classification benchmarks. 5 EXPERIMENTS |
| Researcher Affiliation | Academia | Muthu Chidambaram Department of Computer Science Duke University EMAIL Rong Ge Department of Computer Science Duke University EMAIL |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code used to generate all of the plots in this paper can be found in the associated Github Repository: https://github.com/2014mchidamb/temp-scaling-limitations. |
| Open Datasets | Yes | We can also verify that the phenomena observed in synthetic data translates to the more realistic benchmarks of CIFAR-10, CIFAR-100, and SVHN. |
| Dataset Splits | Yes | For each training dataset considered in this section, we set aside 10% of the data for calibration. |
| Hardware Specification | Yes | on a single A5000 GPU |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or any other software dependencies. |
| Experiment Setup | Yes | All models were trained for 200 epochs using Adam (Kingma & Ba, 2015) with the standard hyperparameters of β1 = 0.9, β2 = 0.999, a learning rate of 0.001, and a batch size of 500 |