Learning Factorized Multimodal Representations
Authors: Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, Ruslan Salakhutdinov
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our model is able to learn meaningful multimodal representations that achieve state-of-the-art or competitive performance on six multimodal datasets. |
| Researcher Affiliation | Academia | {1Machine Learning Department, 2Language Technologies Institute}, Carnegie Mellon University |
| Pseudocode | No | The paper describes its models and methods textually and through diagrams, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Details are provided in the appendix and the code is available at <anonymous>. |
| Open Datasets | Yes | SVHN and MNIST are images with different styles but the same labels (digits 0 9). We randomly pair 100,000 SVHN and MNIST images that have the same label, creating a multimodal dataset which we call SVHN+MNIST. 80,000 pairs are used for training and the rest for testing. |
| Dataset Splits | No | The paper specifies a training and testing split (80,000 pairs for training and the rest for testing) for the SVHN+MNIST dataset, but does not mention a validation split. |
| Hardware Specification | Yes | We would also like to acknowledge NVIDIA s GPU support. |
| Software Dependencies | No | The paper mentions various models and networks (e.g., LSTMs, MFN) and specific tools for feature extraction (Facet, COVAREP), but it does not provide specific version numbers for any software libraries, frameworks, or programming languages used for implementation. |
| Experiment Setup | Yes | All baseline models were retrained with extensive hyperparameter search for fair comparison. |