Context-Aware Zero-Shot Learning for Object Recognition

Authors: Eloi Zablocki, Patrick Bordes, Laure Soulier, Benjamin Piwowarski, Patrick Gallinari

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, through extensive experiments conducted on Visual Genome, we show that contextual information can substantially improve the standard ZSL approach and is robust to unbalanced classes.
Researcher Affiliation Collaboration 1Sorbonne Université, CNRS, Laboratoire d Informatique de Paris 6, LIP6, F-75005 Paris, France 2Criteo AI Lab, Paris.
Pseudocode No The paper describes its methods using mathematical formulations and textual explanations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states: 'To facilitate future work on context-aware ZSL, we publicly release data splits and annotations 1.' Footnote 1 provides a URL. However, this explicitly refers to 'data splits and annotations' and not the source code for the methodology described in the paper.
Open Datasets Yes We rather use Visual Genome (Krishna et al., 2017), a large-scale image dataset (108K images) annotated at a fine-grained level (3.8M object instances), covering various concepts (105K unique object names).
Dataset Splits Yes In order to shape the data to our task, we randomly split the set of images of Visual Genome into train/validation/test sets (70%/10%/20% of the total size).
Hardware Specification No The paper mentions using a pre-trained Inception-v3 CNN and the Adam optimizer, but it does not specify any hardware details such as GPU or CPU models, memory, or cloud computing instance types used for experiments.
Software Dependencies No The paper refers to algorithms and models like Skip-Gram, Inception-v3 CNN, and Adam, along with their respective citations, but it does not provide specific software version numbers for any libraries, frameworks, or programming languages used.
Experiment Setup Yes For each objective LC, LV and LP , at each iteration of the learning algorithm, 5 negative entities are sampled per positive example. Word representations are vectors of R300, learned with the Skip-Gram algorithm (Mikolov et al., 2013) on Wikipedia. Image regions are cropped, rescaled to (299 × 299), and fed to CNN, an Inception-v3 CNN (Szegedy et al., 2016)... Models are trained with Adam (Kingma & Ba, 2014) and regularized with a L2-penalty...