Order-Embeddings of Images and Language
Authors: Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval. ... We evaluate on the Microsoft COCO dataset (Lin et al., 2014b), which has over 120,000 images... Table 2 shows a comparison between all state-of-the-art and some older methods |
| Researcher Affiliation | Academia | Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun Department of Computer Science University of Toronto {vendrov,rkiros,fidler,urtasun}@cs.toronto.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions code for baselines used (e.g., 'http://github.com/seomoz/word2gauss' and 'http://github.com/ryankiros/visual-semantic-embedding'), but does not provide a concrete access link or explicit statement for the open-source code of their own methodology described in the paper. |
| Open Datasets | Yes | We evaluate on the Microsoft COCO dataset (Lin et al., 2014b)... we use the recently proposed SNLI corpus (Bowman et al., 2015)... We use only the Word Net hierarchy as training data (Miller, 1995). |
| Dataset Splits | Yes | We use the data splits of Karpathy & Li (2015) for training (113,287 images), validation (5000 images), and test (5000 images). ... We randomly select 4000 edges for the test split, and another 4000 for the development set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions general concepts like 'CNN' features. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and neural network architectures (GRU, VGG) along with their respective publications, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We learn a 50-dimensional nonnegative vector... with margin α = 1, sampling 500 true and 500 false hypernym pairs in each batch. We train for 30-50 epochs using the Adam optimizer (Kingma & Ba, 2015) with learning rate 0.01 and early stopping... We sample minibatches of 128 random image-caption pairs... We train for 15-30 epochs using the Adam optimizer with learning rate 0.001... We set the dimension of the embedding space and the GRU hidden state N to 1024, the dimension of the learned word embeddings to 300, and the margin α to 0.05. |