Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation

Authors: Volodymyr Kuleshov, Shachi Deshpande

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results yield empirical performance improvements on linear and deep Bayesian models and suggest that calibration should be increasingly leveraged across machine learning. and Empirically, we ﬁnd that our method consistently outputs well-calibrated predictions in linear and deep Bayesian models, and improves performance on downstream tasks with minimal implementation overhead.
Researcher Affiliation	Academia	1Department of Computer Science, Cornell Tech and Cornell University, New York, NY.
Pseudocode	Yes	Algorithm 1 Distribution Recalibration Framework and Algorithm 2 Distribution Calibrated Regression and Algorithm 3 Distribution Calibrated Classiﬁcation
Open Source Code	No	The paper does not include an explicit statement about open-sourcing code or a link to a code repository.
Open Datasets	Yes	Datasets. We use a number of UCI regression datasets varying in size from 194 to 8192 training instances; each training input may have between 6 and 159 continuous features. ... We also perform classiﬁcation on the following standard datasets: MNIST, SVHN, CIFAR10.
Dataset Splits	Yes	We randomly use 25% of each dataset for testing, and use the rest for training. We held out 15% of the training set (up to max of 500 datapoints) for recalibration.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instances used for running the experiments.
Software Dependencies	No	The paper mentions 'implemented easily within deep learning frameworks' but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	In our UCI experiments, we use fully-connected feedforward neural networks with two layers of 128 hidden units with a dropout rate of 0.5 and parametric Re LU non-linearities. ... Our recalibrator R was also a densely connected neural network with two fully connected hidden layers of 20 units each and parametric Re LU non-linearities. ... In regression experiments, we featurized input distributions F using nine quantiles [0.1, ..., 0.9]. We trained R using the quantile regression objective of Algorithm 2;...