Learning Multi-Modal Word Representation Grounded in Visual Context

Authors: Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments and extensive analysis of the obtained results. Section 6 Experiments and Results.
Researcher Affiliation Academia Eloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari LIP6 UPMC Univ Paris 06, UMR 7606, CNRS, Sorbonne Universit es F-75005, Paris, France {eloi.zablocki, benjamin.piwowarski, laure.soulier, patrick.gallinari}@lip6.fr
Pseudocode No The paper describes its model and loss functions using mathematical equations but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for their methodology is publicly available.
Open Datasets Yes We use a large collection of English texts, a dump of the Wikipedia database (http://dumps.wikimedia.org/enwiki), cleaned and tokenized with the Gensim software ( ˇReh uˇrek and Sojka ). This provides us with 4.2 million articles, and a vocabulary of 2.1 million unique words. For visual data, we use the Visual Genome dataset (Krishna et al. 2017)...
Dataset Splits Yes The values of hyperparameters were found with cross-validation: λ = 0.1, μ = 0.1, γ = 0.5, α = 0.2. A linear SVM classifier is trained and 5-fold validation scores are reported.
Hardware Specification No The paper mentions passing images through a 'pre-trained Inception-V3 CNN' but does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions using 'python and TensorFlow' and 'Gensim software', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use 5 negative examples per entity, and our models are trained with stochastic gradient descent with learning rate lr = 10 3 and mini-batches of size 64. N and M are regularized with a L2-penalty respectively weighted by scalars λ and μ. The values of hyperparameters were found with cross-validation: λ = 0.1, μ = 0.1, γ = 0.5, α = 0.2.