Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lightning UQ Box: Uncertainty Quantification for Neural Networks

Authors: Nils Lehmann, Nina Maria Gottschling, Jakob Gawlikowski, Adam J. Stewart, Stefan Depeweg, Eric Nalisnick

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lightning UQ Box works towards this goal by supporting conﬁguration of experiments with simple conﬁguration ﬁles, as well as the Lightning command line interface (CLI). For example, the required conﬁgurations to run a partially stochastic BNN or Deep Kernel Learning model based on the timm library Res Net18 implementation on the Euro SAT dataset from torchgeo is shown in Figure 2. And also "to adequately evaluate the eﬃcacy of these methods for various applications, a common modeling framework is necessary to foster the reproducibility of experiments, provide a fair evaluation, and make UQ methods more easily accessible to various research domains."
Researcher Affiliation	Collaboration	Nils Lehmann EMAIL Data Science in Earth Observation, Technical University of Munich; Stefan Depeweg EMAIL Siemens AG; Eric Nalisnick EMAIL Johns Hopkins University
Pseudocode	No	The paper describes a software library and its design principles, but does not present any pseudocode or algorithm blocks for novel methods.
Open Source Code	Yes	Lightning UQ Box 1 aims to ﬁll this gap... 1. Lightning UQ Box Git Hub repository and documentation
Open Datasets	Yes	For example, the required conﬁgurations to run a partially stochastic BNN or Deep Kernel Learning model based on the timm library Res Net18 implementation on the Euro SAT dataset from torchgeo is shown in Figure 2. ... (right) the same Res Net18 as Deep Kernel Learning model for training on the Euro SAT classiﬁcation dataset from the geospatial Py Torch domain library Torch Geo (Stewart et al., 2022).
Dataset Splits	No	Figure 2 shows example YAML files for configuring models to train on the Euro SAT dataset, including a 'batch_size: 64'. However, the paper does not explicitly state the training, validation, and test splits (e.g., percentages or sample counts) used for the dataset.
Hardware Specification	No	The paper describes a software library and its functionalities, but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for any experiments or development.
Software Dependencies	No	The paper mentions several software components like Py Torch, Py Torch Lightning, timm, and torchgeo, but does not provide specific version numbers for these dependencies, which are necessary for reproducible experiments.
Experiment Setup	Yes	Figure 2: Example YAML ﬁles that conﬁgure (left) a partially stochastic BNN based on a timm Res Net18 model implementation and (right) the same Res Net18 as Deep Kernel Learning model for training on the Euro SAT classiﬁcation dataset from the geospatial Py Torch domain library Torch Geo (Stewart et al., 2022). The YAML files include hyperparameters such as 'num_mc_samples_train: 3', 'num_mc_samples_test: 25', 'batch_size: 64', 'max_epochs: 40', 'gradient_clip_val: 1.0', and 'accumulate_grad_batches: 2'.