Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lightning UQ Box: Uncertainty Quantification for Neural Networks

Authors: Nils Lehmann, Nina Maria Gottschling, Jakob Gawlikowski, Adam J. Stewart, Stefan Depeweg, Eric Nalisnick

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lightning UQ Box works towards this goal by supporting configuration of experiments with simple configuration files, as well as the Lightning command line interface (CLI). For example, the required configurations to run a partially stochastic BNN or Deep Kernel Learning model based on the timm library Res Net18 implementation on the Euro SAT dataset from torchgeo is shown in Figure 2. And also "to adequately evaluate the efficacy of these methods for various applications, a common modeling framework is necessary to foster the reproducibility of experiments, provide a fair evaluation, and make UQ methods more easily accessible to various research domains."
Researcher Affiliation Collaboration Nils Lehmann EMAIL Data Science in Earth Observation, Technical University of Munich; Stefan Depeweg EMAIL Siemens AG; Eric Nalisnick EMAIL Johns Hopkins University
Pseudocode No The paper describes a software library and its design principles, but does not present any pseudocode or algorithm blocks for novel methods.
Open Source Code Yes Lightning UQ Box 1 aims to fill this gap... 1. Lightning UQ Box Git Hub repository and documentation
Open Datasets Yes For example, the required configurations to run a partially stochastic BNN or Deep Kernel Learning model based on the timm library Res Net18 implementation on the Euro SAT dataset from torchgeo is shown in Figure 2. ... (right) the same Res Net18 as Deep Kernel Learning model for training on the Euro SAT classification dataset from the geospatial Py Torch domain library Torch Geo (Stewart et al., 2022).
Dataset Splits No Figure 2 shows example YAML files for configuring models to train on the Euro SAT dataset, including a 'batch_size: 64'. However, the paper does not explicitly state the training, validation, and test splits (e.g., percentages or sample counts) used for the dataset.
Hardware Specification No The paper describes a software library and its functionalities, but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for any experiments or development.
Software Dependencies No The paper mentions several software components like Py Torch, Py Torch Lightning, timm, and torchgeo, but does not provide specific version numbers for these dependencies, which are necessary for reproducible experiments.
Experiment Setup Yes Figure 2: Example YAML files that configure (left) a partially stochastic BNN based on a timm Res Net18 model implementation and (right) the same Res Net18 as Deep Kernel Learning model for training on the Euro SAT classification dataset from the geospatial Py Torch domain library Torch Geo (Stewart et al., 2022). The YAML files include hyperparameters such as 'num_mc_samples_train: 3', 'num_mc_samples_test: 25', 'batch_size: 64', 'max_epochs: 40', 'gradient_clip_val: 1.0', and 'accumulate_grad_batches: 2'.