Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets
Authors: Pierre-Alexandre Mattei, Jes Frellsen
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate our approach by training a convolutional DLVM on incomplete static binarisations of MNIST. Moreover, on various continuous data sets, we show that MIWAE provides extremely accurate single imputations, and is highly competitive with state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, IT University of Copenhagen, Denmark. Correspondence to: Pierre-Alexandre Mattei <EMAIL>, Jes Frellsen <EMAIL>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology. There is no link or explicit statement of code release. |
| Open Datasets | Yes | We illustrate the features of MIWAE by training a DLVM on an incomplete version of the static binarisation of MNIST. We consider a simple setting: with 50% of the pixels missing uniformly at random (in a MCAR fashion). (Dua & Efi, 2017). URL http://archive.ics.uci.edu/ml. |
| Dataset Splits | Yes | To compare models, we evaluate estimates of their test log-likelihood obtained using importance sampling with 5000 samples and an inference network refitted on the test set, as suggested by Cremer et al. (2018) and Mattei & Frellsen (2018b). |
| Hardware Specification | No | The paper does not provide specific hardware details (GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | The intrinsic dimension d is fixed to 10, which may be larger than the actual number of features in the data, but DLVMs are known to automatically ignore some latent dimensions (Dai et al., 2018); both encoder and decoder are multi-layer perceptrons with 3 hidden layers (with 128 hidden units) and tanh activations; we use products of Student s t for the variational family (following Domke & Sheldon, 2018) and the observation model (following Takahashi et al., 2018). We perform 500 000 gradient steps for all data sets; no regularisation scheme is used, but the observation model is constrained so that the eigenvalues of its covariances are larger than 0.01 (as suggested by Mattei & Frellsen, 2018a). |