Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bridging the Gap Between f-divergences and Bayes Hilbert Spaces

Authors: Linus Lach, Alexander Fottner, Yarema Okhrin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By applying this variational estimation framework to f-GANs, we achieve improved FID scores over existing f-GAN architectures and competitive results with the Wasserstein GAN, highlighting its potential for both theoretical research and practical applications in learning theory. Table 3: Comparison of GAN architectures with respect to shape and scale parameters. Table 4: Mean FID Scores and two times the standard deviation for different architectures on the CIFAR10 data set over 100 seeds. Lower FID scores indicate better model performance. Figure 2: Experimental results for the CIFAR10 data set: Top row, from left to right: BHSGAN, KLGAN, and Reverse KLGAN.
Researcher Affiliation	Academia	Linus Lach , Alexander Fottner, Yarema Okhrin Department of Statistics and Data Science University of Augsburg EMAIL
Pseudocode	Yes	In Section 5, we provide detailed explanations of the framework and model architectures, with pseudocode for the key algorithm included in A.3. Algorithm 1 Default values for the training procedure in this paper were αg = αd = 0.0002 for all GANs except BHSGAN and REVKLGAN, where αg = αd = 0.00005, nd = 2, ng = 2, m = 128 for MNIST and m = 64 for CIFAR, Nepochs = 50, dz = 100 and λ = 10 except for PEARSONGAN, where λ = 20
Open Source Code	No	The paper states: "We have taken several steps to ensure the reproducibility of our work. In Section 5, we provide detailed explanations of the framework and model architectures, with pseudocode for the key algorithm included in A.3." However, it does not provide an explicit statement of code release, a link to a repository, or mention of code in supplementary materials beyond pseudocode.
Open Datasets	Yes	We implemented the Bayes Hilbert space GAN (BHSGAN) and compared its results to traditional f-GANs, shown in Table 2, as well as a Wasserstein-GAN (Arjovsky et al., 2017) on the MNIST (Deng, 2012) and CIFAR10 (Krizhevsky et al., 2009) data sets.
Dataset Splits	No	The paper mentions using MNIST and CIFAR-10 datasets and refers to a 'training set' and 'batch size' in Algorithm 1, but it does not specify the exact percentages or sample counts for training/validation/test splits, nor does it explicitly reference the use of standard splits for these datasets within the main text.
Hardware Specification	Yes	All experiments were conducted on an AORUS RTX 4090 e GPU 24GB.
Software Dependencies	No	The paper mentions: "Using the Py Torch framework (Paszke et al., 2019)". While PyTorch is named, a specific version number (e.g., PyTorch 1.9) is not provided, only the publication year of its reference.
Experiment Setup	Yes	Default values for the training procedure in this paper were αg = αd = 0.0002 for all GANs except BHSGAN and REVKLGAN, where αg = αd = 0.00005, nd = 2, ng = 2, m = 128 for MNIST and m = 64 for CIFAR, Nepochs = 50, dz = 100 and λ = 10 except for PEARSONGAN, where λ = 20