Gated Multimodal Units for Information Fusion
Authors: John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Section 4 presents the experimental evaluation setup along with the details of the MM-IMDb dataset. Section 5 shows and discusses the results for movie genre classification. |
| Researcher Affiliation | Academia | Arevalo, John Dept. of Computing Systems and Industrial Engineering Universidad Nacional de Colombia Solorio, Thamar Dept. of Computer Science University of Houston Montes-y-Gómez, Manuel Instituto Nacional de Astrofísica, Optica y Electronica González, Fabio A. Dept. of Computing Systems and Industrial Engineering Universidad Nacional de Colombia |
| Pseudocode | No | The paper includes mathematical equations for the GMU model and a diagram illustrating its structure (Figure 2), but it does not provide pseudocode or a clearly labeled algorithm block with structured steps. |
| Open Source Code | Yes | All the implementation was carried on with the Blocks framework (Van Merrienboer et al., 2015)5. 5https://github.com/johnarevalo/gmu-mmimdb |
| Open Datasets | Yes | Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies. With this work we will make publicly available the Multimodal IMDb (MM-IMDb)1 dataset. 1http://lisi1.unal.edu.co/mmimdb/ |
| Dataset Splits | Yes | The MM-IMDb dataset has been split in three subsets. Train, development and test subsets contain 15552, 2608 and 7799 respectively. The sample was stratified so that training, dev and test sets comprises 60%, 10%, 30% samples of each genre respectively. |
| Hardware Specification | Yes | The authors also thank for K40 Tesla GPU donated by NVIDIA and which was used for some representation learning experiments. |
| Software Dependencies | No | The paper mentions that "All the implementation was carried on with the Blocks framework (Van Merrienboer et al., 2015)", but it does not specify the version number for Blocks or any other software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | Stochastic gradient descent with ADAM optimization (Kingma & Ba, 2014) was used to learn the weights of the neural network. Dropout and max-norm regularization were used to control overfitting. Hidden size ({64, 128, 256, 512}), learning rate (10-3, 10-1), dropout ([0.3, 0.7]), max-norm ([5, 20]) and initialization ranges (10-3, 10-1) parameters were explored by training 25 models with random (uniform) hyperparameter initializations and the best was chosen according to validation performance. |