Unity by Diversity: Improved Representation Learning for Multimodal VAEs
Authors: Thomas Sutter, Yang Meng, Andrea Agostini, Daphné Chopard, Norbert Fortin, Julia Vogt, Babak Shahbaba, Stephan Mandt
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Zurich 2Department of Intensive Care and Neonatology, University Children s Hospital Zurich 3Department of Statistics, UC Irvine 4Department of Neurobiology and Behavior, UC Irvine 5Department of Computer Science, UC Irvine |
| Pseudocode | No | The paper describes methods through text and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for the experiments on the benchmark datasets can be found here: https://github.com/thomassutter/mmvmvae. The code for the hippocampal neural activity experiments can be found here: https://github.com/yangmeng96/mmvmvae-hippocampal. The code for the MIMIC-CXR experiments can be found here: https://github.com/agostini335/mmvmvae-mimic |
| Open Datasets | Yes | Poly MNIST: downloaded the data from https://drive.google.com/drive/folders/1lr-la Ywj Dq3Azala Ie9j N4shpt1w Bs YM? usp=sharing and the code from https://github.com/thomassutter/Mo Po E... Bimodal Celeb A: downloaded from https://drive.google.com/drive/folders/1lr-la Ywj Dq3Azala Ie9j N4shpt1w Bs YM? usp=sharing... CUB image-captions: downloaded from http://www.robots.ox.ac.uk/~yshi/mmdgm/datasets/cub.zip... MIMIC-CXR: downloaded from https://physionet.org/content/mimic-cxr/2.0.0/... Hippocampal Neural Activity data: downloaded from https://datadryad.org/stash/dataset/doi:10.7280/D14X30 |
| Dataset Splits | Yes | We split the dataset into distinct training (80%), validation (10%), and test (10%) sets based on subjects, thus ensuring that the same image or study cannot be present in multiple sets. |
| Hardware Specification | Yes | We use NVIDIA GTX 2080 GPUs for all our runs. (Poly MNIST/Celeb A) and We use NVIDIA A100-SXM4-40GB GPUs for all our runs. (MIMIC-CXR) |
| Software Dependencies | No | All code is written using Python 3.11, Py Torch [Paszke et al., 2019] and Pytorch-Lightning [Falcon and The Py Torch Lightning team, 2019]. We use the scikit-learn [Pedregosa et al., 2011] package for the linear classifiers. (Appendix B.3). Only Python has a version number, other key libraries lack explicit versioning. |
| Experiment Setup | Yes | For all experiments on this dataset, we use an Adam optimizer [Kingma and Ba, 2014] with an initial learning rate of 0.0005, and a batch size of 256. We train all models for 500 epochs. (Poly MNIST, Appendix B.4.2) and similar detailed descriptions for other datasets. |