Order-Embeddings of Images and Language

Authors: Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval. ... We evaluate on the Microsoft COCO dataset (Lin et al., 2014b), which has over 120,000 images... Table 2 shows a comparison between all state-of-the-art and some older methods
Researcher Affiliation Academia Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun Department of Computer Science University of Toronto {vendrov,rkiros,fidler,urtasun}@cs.toronto.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions code for baselines used (e.g., 'http://github.com/seomoz/word2gauss' and 'http://github.com/ryankiros/visual-semantic-embedding'), but does not provide a concrete access link or explicit statement for the open-source code of their own methodology described in the paper.
Open Datasets Yes We evaluate on the Microsoft COCO dataset (Lin et al., 2014b)... we use the recently proposed SNLI corpus (Bowman et al., 2015)... We use only the Word Net hierarchy as training data (Miller, 1995).
Dataset Splits Yes We use the data splits of Karpathy & Li (2015) for training (113,287 images), validation (5000 images), and test (5000 images). ... We randomly select 4000 edges for the test split, and another 4000 for the development set.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions general concepts like 'CNN' features.
Software Dependencies No The paper mentions optimizers (Adam) and neural network architectures (GRU, VGG) along with their respective publications, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes We learn a 50-dimensional nonnegative vector... with margin α = 1, sampling 500 true and 500 false hypernym pairs in each batch. We train for 30-50 epochs using the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.01 and early stopping... We sample minibatches of 128 random image-caption pairs... We train for 15-30 epochs using the Adam optimizer with learning rate 0.001... We set the dimension of the embedding space and the GRU hidden state N to 1024, the dimension of the learned word embeddings to 300, and the margin α to 0.05.