Learning Factorized Multimodal Representations

Authors: Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our model is able to learn meaningful multimodal representations that achieve state-of-the-art or competitive performance on six multimodal datasets.
Researcher Affiliation Academia {1Machine Learning Department, 2Language Technologies Institute}, Carnegie Mellon University
Pseudocode No The paper describes its models and methods textually and through diagrams, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No Details are provided in the appendix and the code is available at <anonymous>.
Open Datasets Yes SVHN and MNIST are images with different styles but the same labels (digits 0 9). We randomly pair 100,000 SVHN and MNIST images that have the same label, creating a multimodal dataset which we call SVHN+MNIST. 80,000 pairs are used for training and the rest for testing.
Dataset Splits No The paper specifies a training and testing split (80,000 pairs for training and the rest for testing) for the SVHN+MNIST dataset, but does not mention a validation split.
Hardware Specification Yes We would also like to acknowledge NVIDIA s GPU support.
Software Dependencies No The paper mentions various models and networks (e.g., LSTMs, MFN) and specific tools for feature extraction (Facet, COVAREP), but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used for implementation.
Experiment Setup Yes All baseline models were retrained with extensive hyperparameter search for fair comparison.