Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hierarchical VAE with a Diffusion-based VampPrior

Authors: Anna Kuzina, Jakub M. Tomczak

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our method on standard benchmark datasets (MNIST, OMNIGLOT, CIFAR10) and demonstrate improved training stability and latent space utilization. ... We report all results in Table 1, where we compare the proposed approach with other hierarchical VAEs. ... We conduct an extensive ablation study regarding pseudoinputs.
Researcher Affiliation	Academia	Anna Kuzina EMAIL Department of Computer Science, Vrije Universiteit Amsterdam, Netherlands. Jakub M. Tomczak EMAIL Department of Mathematics and Computer Science, Eindhoven University of Technology, Netherlands.
Pseudocode	Yes	Algorithm 1 fdct: Create DCT-based pseudoinputs Input: x Rc D D, S Rc d d, d R u DCT = DCT(x) u DCT = Crop(u DCT, d) u DCT = u DCT S Return: u DCT Rc d d ... Algorithm 2 f dct: Invert DCT-based pseudoinputs Input: u DCT Rc d d, S Rc d d, D u DCT = u DCT S u DCT = zero_pad(u DCT, D d) ux = i DCT(u DCT) Return: ux Rc D D
Open Source Code	Yes	We provide all the hyperparameters for training DVP-VAE in Appendix C.1 and in the code repository2. 2https://github.com/AKuzina/dvp_vae
Open Datasets	Yes	We evaluate DVP-VAE on dynamically binarized MNIST (Le Cun, 1998) and OMNIGLOT (Lake et al., 2015). Furthermore, we conduct experiments on natural images using the CIFAR10 dataset (Alex, 2009).
Dataset Splits	No	The paper mentions using standard benchmark datasets (MNIST, OMNIGLOT, CIFAR10) and discusses training and validation losses, implying splits were used. However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for how these datasets were split for training, validation, or testing.
Hardware Specification	Yes	For a bigger input size (CIFAR10 dataset) a model with more than 750 pseudoinputs does not fit into a single A100 GPU.
Software Dependencies	No	The paper mentions using the Adamax optimizer and a UNet implementation, but it does not specify version numbers for key software components such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup	Yes	We provide all the hyperparameters for training DVP-VAE in Appendix C.1 and in the code repository. ... Table 7: Full list of hyperparameters. MNIST OMNIGLOT CIFAR10 Optimization # Epochs 300 500 3000 Batch Size (per GPU) 250 250 128 # GPUs 1 1 1 Optimizer Adamax Adamax Adamax Scheduler Cosine Cosine Cosine Starting LR 1e-2 1e-2 3e-3 End LR 1e-5 1e-4 1e-4 LR warmup (epochs) 2 2 5 Weight Decay 1e-6 1e-6 1e-6 EMA rate 0.999 0.999 0.999 Grad. Clipping 5 2 150 log σ clipping -10 -10 -10 Latent Sizes ... Architecture ... Context Prior ...