not-MIWAE: Deep Generative Modelling with Missing not at Random Data

Authors: Niels Bruun Ipsen, Pierre-Alexandre Mattei, Jes Frellsen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we apply the not-MIWAE to problems with values MNAR: censoring in multivariate datasets, clipping in images and selection bias in recommender systems. Implementation details and a link to source code can be found in appendix A.
Researcher Affiliation Academia Niels Bruun Ipsen nbip@dtu.dk Pierre-Alexandre Mattei pierre-alexandre.mattei@inria.fr Jes Frellsen jefr@dtu.dk Department of Applied Mathematics and Computer Science, Technical University of Denmark, Denmark Universit e Cˆote d Azur, Inria (Maasai team), Laboratoire J.A. Dieudonn e, UMR CNRS 7351, France
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at: https://github.com/nbip/not MIWAE
Open Datasets Yes We compare different imputation techniques on datasets from the UCI database (Dua & Graff, 2017), street view house numbers dataset (SVHN, Netzer et al., 2011) and The Yahoo! R3 dataset (webscope.sandbox.yahoo.com).
Dataset Splits No While the Yahoo! R3 dataset describes separate training and test sets, the paper does not provide specific percentages, counts, or a methodology for data splitting (including validation splits) for any of the datasets (UCI, SVHN, Yahoo! R3) that would be needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general software frameworks.
Software Dependencies No The paper mentions using 'TensorFlow probability (Dillon et al., 2017) and the Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for these software components.
Experiment Setup Yes The encoder and decoder consist of two hidden layers with 128 units and tanh activation functions. ... The size of the latent space is set to p − 1, K = 20 importance samples were used during training and a batch size of 16 was used for 100k iterations. ... K = 5 importance samples were used during training and a batch size of 64 was used for 1M iterations. ... We use K = 20 importance samples during training, ReLU activations, a batch size of 100 and train for 10k iterations.