Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

Authors: Vaidotas Simkus, Michael U. Gutmann

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed methods for VAE estimation on synthetic and realistic data sets with missing data (section 6).
Researcher Affiliation	Academia	Vaidotas Simkus EMAIL Michael U. Gutmann EMAIL School of Informatics University of Edinburgh
Pseudocode	Yes	Algorithm 1 Shared computation of the De Miss VAE learning objectives
Open Source Code	Yes	The methods are summarised in table 1 and the code implementation is available at https://github.com/ vsimkus/demiss-vae.
Open Datasets	Yes	We here evaluate the proposed methods on real-world data sets from the UCI repository (Dua & Graff, 2017; Papamakarios et al., 2017).
Dataset Splits	No	The paper mentions evaluating on a 'complete test data set' and a '20K sample data set used to fit the VAEs' but does not provide specific split percentages or counts (e.g., train/validation/test splits). The missingness percentages (e.g., 20/50/80%) refer to data incompleteness, not dataset splits for training and evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions the 'AMSGrad optimiser (Reddi et al., 2018)' and 'STL gradients (Roeder et al., 2017)' which are algorithms/techniques, but it does not specify software libraries with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup	Yes	We then fitted a VAE model with 2-dimensional latent space using diagonal Gaussian encoder and decoder distributions, and a fixed standard Normal prior. For the decoder and encoder networks we used fullyconnected residual neural networks with 3 residual blocks, 200 hidden dimensions, and Re LU activations. To optimise the model parameters we have used AMSGrad optimiser (Reddi et al., 2018) with a learning rate of 10 3 for a total of 500 epochs.