Unity by Diversity: Improved Representation Learning for Multimodal VAEs

Authors: Thomas Sutter, Yang Meng, Andrea Agostini, Daphné Chopard, Norbert Fortin, Julia Vogt, Babak Shahbaba, Stephan Mandt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.
Researcher Affiliation Academia 1Department of Computer Science, ETH Zurich 2Department of Intensive Care and Neonatology, University Children s Hospital Zurich 3Department of Statistics, UC Irvine 4Department of Neurobiology and Behavior, UC Irvine 5Department of Computer Science, UC Irvine
Pseudocode No The paper describes methods through text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes The code for the experiments on the benchmark datasets can be found here: https://github.com/thomassutter/mmvmvae. The code for the hippocampal neural activity experiments can be found here: https://github.com/yangmeng96/mmvmvae-hippocampal. The code for the MIMIC-CXR experiments can be found here: https://github.com/agostini335/mmvmvae-mimic
Open Datasets Yes Poly MNIST: downloaded the data from https://drive.google.com/drive/folders/1lr-la Ywj Dq3Azala Ie9j N4shpt1w Bs YM? usp=sharing and the code from https://github.com/thomassutter/Mo Po E... Bimodal Celeb A: downloaded from https://drive.google.com/drive/folders/1lr-la Ywj Dq3Azala Ie9j N4shpt1w Bs YM? usp=sharing... CUB image-captions: downloaded from http://www.robots.ox.ac.uk/~yshi/mmdgm/datasets/cub.zip... MIMIC-CXR: downloaded from https://physionet.org/content/mimic-cxr/2.0.0/... Hippocampal Neural Activity data: downloaded from https://datadryad.org/stash/dataset/doi:10.7280/D14X30
Dataset Splits Yes We split the dataset into distinct training (80%), validation (10%), and test (10%) sets based on subjects, thus ensuring that the same image or study cannot be present in multiple sets.
Hardware Specification Yes We use NVIDIA GTX 2080 GPUs for all our runs. (Poly MNIST/Celeb A) and We use NVIDIA A100-SXM4-40GB GPUs for all our runs. (MIMIC-CXR)
Software Dependencies No All code is written using Python 3.11, Py Torch [Paszke et al., 2019] and Pytorch-Lightning [Falcon and The Py Torch Lightning team, 2019]. We use the scikit-learn [Pedregosa et al., 2011] package for the linear classifiers. (Appendix B.3). Only Python has a version number, other key libraries lack explicit versioning.
Experiment Setup Yes For all experiments on this dataset, we use an Adam optimizer [Kingma and Ba, 2014] with an initial learning rate of 0.0005, and a batch size of 256. We train all models for 500 epochs. (Poly MNIST, Appendix B.4.2) and similar detailed descriptions for other datasets.