Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards neural networks that provably know when they don't know

Authors: Alexander Meinke, Matthias Hein

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments we show that stateof-the-art methods fail in this worst-case setting whereas our model can guarantee its performance while retaining state-of-the-art OOD performance.
Researcher Affiliation	Academia	Alexander Meinke University of Tübingen Matthias Hein University of Tübingen
Pseudocode	No	The paper describes the model and estimation process mathematically but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Code at https://github.com/Alex Meinke/certified-certain-uncertainty
Open Datasets	Yes	We evaluate the worst-case performance of various OOD detection methods within regions for which CCU yields guarantees and by standard OOD on MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009).
Dataset Splits	Yes	We evaluate the worst-case performance of various OOD detection methods within regions for which CCU yields guarantees and by standard OOD on MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR10 and CIFAR100 (Krizhevsky & Hinton, 2009).
Hardware Specification	No	The paper does not provide specific hardware details beyond general model architectures (e.g., Le Net, Resnet18, VGG).
Software Dependencies	No	The paper mentions optimizers (ADAM, SGD) and neural network architectures (Le Net, ResNet18, VGG) but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Unless speciﬁed otherwise we use ADAM on MNIST with a learning rate of 1e 3 and SGD with learning rate 0.1 for the other datasets. The learning rate for the GMM is always set to 1e 5. We decrease all learning rates by a factor of 10 after 50, 75 and 90 epochs. Our batch size is 128, the total number of epochs 100 and weight decay is set to 5e 4.