Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images
Authors: Junhua Mao, Jiajing Xu, Kevin Jing, Alan L. Yuille
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our model benefits from incorporating the visual information into the word embeddings, and a weight sharing strategy is crucial for learning such multimodal embeddings. |
| Researcher Affiliation | Collaboration | Junhua Mao1 Jiajing Xu2 Yushi Jing2 Alan Yuille1,3 1University of California, Los Angeles 2Pinterest Inc. 3Johns Hopkins University |
| Pseudocode | No | The paper describes the model architecture and components but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'The project page is: http://www.stat.ucla.edu/~junhua.mao/multimodal_embedding.html1.' and footnote 1 says 'The datasets introduced in this work will be gradually released on the project page.' This mentions dataset release, not explicit release of the source code for the methodology. |
| Open Datasets | Yes | More specifically, we introduce a large-scale dataset with 300 million sentences describing over 40 million images crawled and downloaded from publicly available Pins (i.e. an image with sentence descriptions uploaded by users) on Pinterest [2]. ... We denote this dataset as the Pinterest40M dataset. ... To facilitate research in this area, we will gradually release the datasets proposed in this paper on our project page. |
| Dataset Splits | Yes | We train the models until the loss does not decrease on a small validation set with 10,000 images and their descriptions. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Python s stemmer package' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use the stochastic gradient descent method with a mini-batch size of 256 sentences and a learning rate of 1.0. The gradient is clipped to 10.0. We train the models until the loss does not decrease on a small validation set with 10,000 images and their descriptions. The models will scan the dataset for roughly five 5 epochs. The bias terms of the gates (i.e. br and bu in Eqn. 1 and 2) in the GRU layer are initialized to 1.0. |