Learning Multimodal Word Representation via Dynamic Fusion Methods

Authors: Shaonan Wang, Jiajun Zhang, Chengqing Zong

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.
Researcher Affiliation Academia National Laboratory of Pattern Recognition, CASIA, Beijing, China 2 University of Chinese Academy of Sciences, Beijing, China 3 CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The data and code for training and evaluation can be found at: https://github.com/wangshaonan/dynamicFusion
Open Datasets Yes Datasets We use 300-dimensional Glo Ve vectors3 which are trained on the Common Crawl corpus... Our source of visual vectors are collected from Image Net (Russakovsky et al. 2015)... The training dataset are selected from about 20,000 word association pairs... The dataset is collected by (De Deyne, Perfors, and Navarro 2016) and can be found at: https://simondedeyne.me/data.
Dataset Splits Yes Model hyper-parameters are tuned by 5-fold cross validation (20% of data for testing and 80% for training)... We use the remaining word association pairs as the development dataset (word pairs together with their association scores).
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software like Theano, Lasagne, Adagrad, and sklearn, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We test the initial learning rate over {0.05, 0.01, 0.5, 0.1}, set batch size to 25, and train the model for 5 epochs. We set the initial parameters in three gates to 1.0 and select the best parameters on the development set... In Ridge model, the optimal regularization parameter is 0.6. The Mapping model is trained with SGD for maximum 100 epochs with early stopping, and the optimal learning rate is 0.001.