reproducibilityindex.ai

Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images

Authors: Junhua Mao, Jiajing Xu, Kevin Jing, Alan L. Yuille

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our model beneﬁts from incorporating the visual information into the word embeddings, and a weight sharing strategy is crucial for learning such multimodal embeddings.
Researcher Affiliation	Collaboration	Junhua Mao1 Jiajing Xu2 Yushi Jing2 Alan Yuille1,3 1University of California, Los Angeles 2Pinterest Inc. 3Johns Hopkins University
Pseudocode	No	The paper describes the model architecture and components but does not provide any pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'The project page is: http://www.stat.ucla.edu/~junhua.mao/multimodal_embedding.html1.' and footnote 1 says 'The datasets introduced in this work will be gradually released on the project page.' This mentions dataset release, not explicit release of the source code for the methodology.
Open Datasets	Yes	More speciﬁcally, we introduce a large-scale dataset with 300 million sentences describing over 40 million images crawled and downloaded from publicly available Pins (i.e. an image with sentence descriptions uploaded by users) on Pinterest [2]. ... We denote this dataset as the Pinterest40M dataset. ... To facilitate research in this area, we will gradually release the datasets proposed in this paper on our project page.
Dataset Splits	Yes	We train the models until the loss does not decrease on a small validation set with 10,000 images and their descriptions.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions 'Python s stemmer package' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	We use the stochastic gradient descent method with a mini-batch size of 256 sentences and a learning rate of 1.0. The gradient is clipped to 10.0. We train the models until the loss does not decrease on a small validation set with 10,000 images and their descriptions. The models will scan the dataset for roughly ﬁve 5 epochs. The bias terms of the gates (i.e. br and bu in Eqn. 1 and 2) in the GRU layer are initialized to 1.0.