Generalized Multimodal ELBO

Authors: Thomas M. Sutter, Imant Daunhawer, Julia E Vogt

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in selfsupervised, generative learning tasks.
Researcher Affiliation Academia Department of Computer Science ETH Zurich 8092 Zurich, Switzerland
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The detailed architectures can also be looked up in the released code.
Open Datasets Yes We introduce a new dataset called Poly MNIST with 5 simplified modalities. Additionally, we evaluate all models on the trimodal matching digits dataset MNIST-SVHN-Text and the challenging bimodal Celeba dataset with images and text. The latter two were introduced in Sutter et al. (2020).
Dataset Splits Yes In total there are 60, 000 tuples of training examples and 10, 000 tuples of test examples and we make sure that no two MNIST digits were used in both the training and test set.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance types used for experiments.
Software Dependencies No The paper mentions using "scikit-learn" and an "Adam optimizer (Kingma & Ba, 2014)" but does not provide specific version numbers for these software components or the deep learning framework used.
Experiment Setup Yes The latent space dimension is set to 20 for all modalities, models and runs. The results in tables 2 to 4 are generated with β = 5.0. We train all models for 150 epochs. ... We use an Adam optimizer ... with an initial learning rate 0.001.