reproducibilityindex.ai

Multimodal Neural Language Models

Authors: Ryan Kiros, Ruslan Salakhutdinov, Rich Zemel

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentation is performed on three datasets with image-text descriptions: IAPR TC-12, Attributes Discovery, and the SBU datasets. We further illustrate capabilities of our models through quantitative retrieval evaluation and visualizations of our results.
Researcher Affiliation	Academia	Ryan Kiros RKIROS@CS.TORONTO.EDU Ruslan Salakhutdinov RSALAKHU@CS.TORONTO.EDU Richard Zemel ZEMEL@CS.TORONTO.EDU Department of Computer Science, University of Toronto Canadian Institute for Advanced Research
Pseudocode	No	The paper describes algorithms and models in text and diagrams (Figure 2) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states that the Attributes Discovery dataset split and SBU word embeddings 'will be made publicly available', but does not explicitly state that the source code for their methodology is available or provide a link.
Open Datasets	Yes	We perform experimental evaluation of our proposed models on three publicly available datasets: IAPR TC-12 This data set consists of 20,000 images... We used a publicly available train/test split for our experiments. Attribute Discovery This dataset contains roughly 40,000 images... We used a random train/test split for our experiments which will be made publicly available. SBU Captioned Photos We obtained a subset of roughly 400,000 images from the SBU dataset (Ordonez et al., 2011)...
Dataset Splits	Yes	For each of our experiments, we split the training set into 80% training and 20% validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only mentioning general computational aspects like 'gradients from the loss could then be backpropagated from the language model through the convolutional network to update ﬁlter weights'.
Software Dependencies	No	The paper mentions using pre-trained embeddings of Turian et al. (2010), but does not specify software dependencies with version numbers.
Experiment Setup	Yes	Each of our language models were trained using the following hyperparameters: all context matrices used a weight decay of 1.0 10 4 while word representations used a weight decay of 1.0 10 5. All other weight matrices, including the convolutional network ﬁlters use a weight decay of 1.0 10 4. We used batch sizes of 20 and an initial learning rate of 0.2 (averaged over the minibatch) which was exponentially decreased at each epoch by a factor of 0.998. Gated methods used an initial learning rate of 0.02. Initial momentum was set to 0.5 and was increased linearly to 0.9 over 20 epochs. The word representation matrices were initialized to the 50 dimensional pre-trained embeddings of Turian et al. (2010). We used a context size of 5 for each of our models. ... Since features used have varying dimensionality, an additional layer was added to map images to 256 dimensions, so that across all experiments the input size to the bias and gating units are equivalent.