Visalogy: Answering Visual Analogy Questions

Authors: Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper we study the problem of visual analogies for natural images and show the first results of its kind on solving visual analogy questions for natural images. Our experimental evaluations show promising results on solving visual analogy questions.
Researcher Affiliation Collaboration Fereshteh Sadeghi University of Washington fsadeghi@cs.washington.edu C. Lawrence Zitnick Microsoft Research larryz@microsoft.com Ali Farhadi University of Washington, The Allen Institute for AI ali@cs.washington.edu
Pseudocode No The paper presents a network architecture diagram in Figure 2, but no pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes To evaluate the capability of our trained network for solving analogy questions in the test scenarios explained above, we use a large dataset of 3D chairs [4] as well as a novel dataset of natural images (VAQA), that we collected for solving analogy questions on natural images.
Dataset Splits Yes We randomly select 1000 styles and 16 view points for training and keep the rest for testing. We have also used the double margin loss function introduced in Equation 3 with m P = 0.2, m N = 0.4 which we empirically found to give the best results in a held-out validatation set.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions 'Alex Net pre-trained network for the task of large-scale object recognition (ILSVRC2012) provided by the BVLC Caffe website [31]', but does not specify version numbers for Caffe or any other software dependencies.
Experiment Setup Yes In all the experiments, we use stochastic gradient descent (SGD) to train our network. We fine-tune the last two fully connected layers (fc6, fc7) and the last convolutional layer (conv5) unless stated otherwise. We have also used the double margin loss function introduced in Equation 3 with m P = 0.2, m N = 0.4 which we empirically found to give the best results in a held-out validatation set.