Imagined Visual Representations as Multimodal Embeddings

Authors: Guillem Collell, Ted Zhang, Marie-Francine Moens

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using seven benchmark concept similarity tests we show that the mapped (or imagined) vectors not only help to fuse multimodal information, but also outperform strong unimodal baselines and state-of-the-art multimodal methods, thus exhibiting more human-like judgments.
Researcher Affiliation Academia Guillem Collell, Ted Zhang, and Marie-Francine Moens Computer Science Department KU Leuven 3001 Heverlee, Belgium gcollell@ku.leuven.be; tedz.cs@gmail.com; sien.moens@cs.kuleuven.be
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper states 'our embeddings are publicly available' with a link to http://liir.cs.kuleuven.be/software.php, which provides the embeddings (data) but not the source code for the methodology itself.
Open Datasets Yes We use 300-dimensional Glo Ve1 vectors (Pennington, Socher, and Manning 2014) pre-trained on the Common Crawl corpus... We use Image Net (Russakovsky et al. 2015) as our source of visual information... We tested the proposed method with 7 benchmark tests, covering three different tasks: (i) General relatedness: MEN (Bruni, Tran, and Baroni 2014) and Wordsim353-rel (Agirre et al. 2009); (ii) Semantic or taxonomic similarity: Sem Sim (Silberer and Lapata 2014), Simlex999 (Hill, Reichart, and Korhonen 2015), Wordsim353-sim (Agirre et al. 2009) and Sim Verb-3500 (Gerz et al. 2016); (iii) Visual similarity: Vis Sim (Silberer and Lapata 2014).
Dataset Splits No The paper mentions training data and test sets, and a concept of 'zero-shot' words, but does not specify explicit training, validation, and test split percentages or absolute numbers needed for reproduction.
Hardware Specification No The paper mentions using a 'pre-trained VGG-m-128 CNN model', 'Alex Net CNN model', and 'Res Net CNN', but does not specify the hardware (e.g., GPU, CPU model, RAM) used to run the experiments or train these models.
Software Dependencies No All mappings are implemented with the scikit-learn toolkit (Pedregosa et al. 2011) and a pretrained VGG-m-128 CNN model... implemented with the Matlab Mat Conv Net toolkit (Vedaldi and Lenc 2015). No version numbers are provided for scikit-learn or Mat Conv Net.
Experiment Setup Yes Both, neural network and linear models are learned by stochastic gradient descent and a total of nine parameter combinations are tested (learning rate = [0.1, 0.01, 0.005] and dropout rate = [0.5, 0.25, 0.1]). We report a linear model with learning rate of 0.1 and dropout rate of 0.1, running it for 175 epochs. For the neural network, we additionally test three different architectures with 50, 150 and 300 hidden units. We report a neural network with 300 hidden units, dropout rate of 0.25 and learning rate of 0.1, trained for 25 epochs.