Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models

Authors: Oleg Arenz, Philipp Dahlinger, Zihan Ye, Michael Volpp, Gerhard Neumann

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we evaluate several design choices based on a highly modular implementation of our generalized framework. We propose a novel combination of design choices and show that it significantly outperforms both prior methods. ... We evaluate each candidate on three target distributions: In the Breast Cancer experiment ... Further, we introduce the variant Breast Cancer MB ... Thirdly, we evaluate the candidates in a more challenging Bayesian neural network experiment on the Wine dataset ... We tuned the hyperparameters of each of the 18 candidates separately for each test problem using Bayesian optimization. ... The results of the hyperparameter search are summarized in Table 2 ... We also show for every experiment a second metric, where we use the maximum mean discrepancy (Gretton et al., 2012) (MMD) for Breast Cancer and Breast Cancer MB ... and the mean squared error of the Bayesian inference predictions for Wine.
Researcher Affiliation	Academia	Oleg Arenz EMAIL Intelligent Autonomous Systems Technical University of Darmstadt Philipp Dahlinger EMAIL Autonomous Learning Robots Karlsruhe Institute of Technology Zihan Ye EMAIL Artificial Intelligence & Machine Learning Hessian Center for AI (hessian.AI) Technical University of Darmstadt Michael Volpp EMAIL Gerhard Neumann EMAIL Autonomous Learning Robots Karlsruhe Institute of Technology
Pseudocode	Yes	We can unify both methods in a common framework (Algorithm 1) with seven modules that can be implemented depending on the design choice: (1) The Sample Selector ... (7) the Component Adaptation module ... Algorithm 1 Natural Gradient GMM Variational Inference
Open Source Code	Yes	Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports 432 different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner. ... We release the open-source implementation of our generalized framework for GMM-based VI. Our implementation allows each design choice to be set independently and outperforms the reference implementations of i Bayes-GMM and VIPS when using their respective design choices. A separate reproducibility package contains the scripts we used for starting each experiment, including hyperparameter optimization. ... To increase the transparency of our empirical study we release a separate reproducibility package2, which contains scripts for running each experiment, thereby documenting the exact conditions under which all our experiments have been started (including hyperparameter search). 2https://github.com/OlegArenz/gmmvi_reproducibility
Open Datasets	Yes	In the Breast Cancer experiment Arenz et al. (2018) we perform Bayesian logistic regression using the full Breast Cancer datatset (Lichman, 2013). ... Bayesian neural network experiment on the Wine dataset (Lichman, 2013) ... the 25-dimensional German Credit dataset (Lichman, 2013); GMM100 and STM300 are higher-dimensional variants ... The data sets can be obtained from the UCI Machine Learning Repository (Lichman, 2013)
Dataset Splits	No	The paper mentions: 'We split the data set into a training and test set, and make sure that the training and test sets are deterministic given the seed.' for the WINE experiment. It also mentions minibatches of size 64 or 128. However, it does not provide specific percentages or absolute sample counts for the training, validation, or test splits for any dataset, nor does it refer to a specific predefined split with a citation for replication.
Hardware Specification	No	The paper states: 'We granted each candidate exclusive access to a compute node with 96 cores for two days per experiment.' and 'The authors gratefully acknowledge the computing time provided to them on the high-performance computer Lichtenberg at the NHR Centers NHR4CES at TU Darmstadt.' and 'This work was performed on the Hore Ka supercomputer...'. While compute resources are mentioned, specific hardware details such as exact CPU or GPU models, or detailed specifications of the supercomputers (beyond a general '96 cores' for a node) are not provided.
Software Dependencies	No	The paper mentions 'our Tensorflow (Abadi et al., 2015) implementation'. However, it does not specify a version number for TensorFlow or any other key software libraries used in their implementation.
Experiment Setup	Yes	We tuned the hyperparameters of each of the 18 candidates separately for each test problem using Bayesian optimization. ... We used the best hyperparameters from the Bayesian optimization and evaluated the performance over ten seeds. ... For selecting the hyperparameters, we perform for each candidate and each experiment a small grid search, where we make use of the results from the previous experiments to select suitable ranges. ... We list the hyperparameters for each design choice in Table 7. Please refer to Appendix I for a description of the different hyperparameters. The tested and eventually chosen hyperparameters for each experiment can be found in the reproducibility package