Gated Multimodal Units for Information Fusion

Authors: John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Section 4 presents the experimental evaluation setup along with the details of the MM-IMDb dataset. Section 5 shows and discusses the results for movie genre classification.
Researcher Affiliation Academia Arevalo, John Dept. of Computing Systems and Industrial Engineering Universidad Nacional de Colombia Solorio, Thamar Dept. of Computer Science University of Houston Montes-y-Gómez, Manuel Instituto Nacional de Astrofísica, Optica y Electronica González, Fabio A. Dept. of Computing Systems and Industrial Engineering Universidad Nacional de Colombia
Pseudocode No The paper includes mathematical equations for the GMU model and a diagram illustrating its structure (Figure 2), but it does not provide pseudocode or a clearly labeled algorithm block with structured steps.
Open Source Code Yes All the implementation was carried on with the Blocks framework (Van Merrienboer et al., 2015)5. 5https://github.com/johnarevalo/gmu-mmimdb
Open Datasets Yes Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies. With this work we will make publicly available the Multimodal IMDb (MM-IMDb)1 dataset. 1http://lisi1.unal.edu.co/mmimdb/
Dataset Splits Yes The MM-IMDb dataset has been split in three subsets. Train, development and test subsets contain 15552, 2608 and 7799 respectively. The sample was stratified so that training, dev and test sets comprises 60%, 10%, 30% samples of each genre respectively.
Hardware Specification Yes The authors also thank for K40 Tesla GPU donated by NVIDIA and which was used for some representation learning experiments.
Software Dependencies No The paper mentions that "All the implementation was carried on with the Blocks framework (Van Merrienboer et al., 2015)", but it does not specify the version number for Blocks or any other software dependencies required to reproduce the experiments.
Experiment Setup Yes Stochastic gradient descent with ADAM optimization (Kingma & Ba, 2014) was used to learn the weights of the neural network. Dropout and max-norm regularization were used to control overfitting. Hidden size ({64, 128, 256, 512}), learning rate (10-3, 10-1), dropout ([0.3, 0.7]), max-norm ([5, 20]) and initialization ranges (10-3, 10-1) parameters were explored by training 25 models with random (uniform) hyperparameter initializations and the best was chosen according to validation performance.