Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VIKING: Deep variational inference with stochastic projections

Authors: Samuel Matthiesen, Hrittik Roy, Nicholas Krämer, Yevgen Zainchkovskyy, Stas Syrota, Alejandro Valverde Mahou, Carl Henrik Ek, Søren Hauberg

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments We evaluate VIKING against popular Bayesian deep learning methods using standard benchmarks for Bayesian deep learning. ... The results in Table 1 show that our ELBO generalize better in most cases... Image classification. Using standard benchmarks, we evaluate our method against the maximum a posteriori (MAP) estimate, the (loss-projected) post-hoc method from Miani et al. (2025), IVON (Shen et al., 2024), SWAG (Maddox et al., 2019), and a last-layer Laplace approximation.
Researcher Affiliation	Academia	Samuel G. Fadel, Hrittik Roy, Nicholas Krämer, Yevgen Zainchkovskyy, Stas Syrota, Alejandro Valverde Mahou Technical University of Denmark EMAIL Carl Henrik Ek University of Cambridge EMAIL Søren Hauberg Technical University of Denmark EMAIL
Pseudocode	Yes	Algorithm 1: The VIKING algorithm Input: Initial values of (ˆθ, σker, σim), dataset with B mini-batches Output: Optimized values of (ˆθ, σker, σim), representing q(θ) (Equation 2)
Open Source Code	Yes	The both code for reproducing our experiments1 and an easy-to-use library2 are available online. 1https://github.com/eugene/viking-paper-experiments 2https://github.com/fadel/viking
Open Datasets	Yes	Image classification. Using standard benchmarks... On MNIST (Le Cun et al., 2010) and Fashion MNIST (Xiao et al., 2017), we train a Le Net (Le Cun et al., 1989) model, while on SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky and Hinton, 2009) we use a small Res Net (He et al., 2016).
Dataset Splits	Yes	Image classification. Using standard benchmarks, we evaluate our method against the maximum a posteriori (MAP) estimate... Unless otherwise noted, all data is standardized, that is, every data point is first subtracted by the mean of training data and the result is divided by the standard deviation of training data, both computed across dimensions. Naturally, this is performed also for the validation and test sets. When evaluating on validation and test sets, we draw 20 posterior samples.
Hardware Specification	No	The supplementary material provides these details; All experiments are carried out on conventional hardware (x86 architecture, NVIDIA GPU s).
Software Dependencies	No	The paper does not provide specific version numbers for software libraries or frameworks used in the implementation of VIKING. It only mentions JAX for a baseline method (SGLD) without a version.
Experiment Setup	Yes	C Experimental details For reproduction purposes, we detail each experiment (in order of appearance in the main paper). Note that the code implementing VIKING and for reproducing the experiments is publicly available4. We start with a few general experimental and implementation details used across the experiments. All optimization steps were done using the Adam optimizer. In what follows, we address specific components of the optimization that are shared across all experiments, unless otherwise noted. Table 7: Hyperparameters broken down per dataset. MC samples indicates the number of posterior samples used for training. MNIST F-MNIST SVHN CIFAR-10 Batch size 32 16 32 32 Warmup epochs 50 20 50 20 Learning rate 10 4 10 4 10 4 10 4 β 10 5 10 5 10 6 10 6 γ 0.8 0.2 0.8 0.5 MC samples 1 Warmup learning rate 10 3 Initial log α 4.0 Initial log σim 2.0 σker, σim tuning 5 epochs ELBO optimization 50 epochs