reproducibilityindex.ai

Where are they looking?

Authors: Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The quantitative evaluation shows that our approach produces reliable results, even when viewing only the back of the head. While our method outperforms several baseline approaches, we are still far from reaching human performance on this task.
Researcher Affiliation	Academia	Adri a Recasens Aditya Khosla Carl Vondrick Antonio Torralba Massachusetts Institute of Technology {recasens, khosla, vondrick, torralba}@csail.mit.edu
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our model, code and dataset are available for download at http://gazefollow.csail.mit.edu.
Open Datasets	Yes	Our model, code and dataset are available for download at http://gazefollow.csail.mit.edu. We used several major datasets that contain people as a source of images: 1, 548 images from SUN [19], 33, 790 images from MS COCO [13], 9, 135 images from Actions 40 [20], 7, 791 images from PASCAL [4], 508 images from the Image Net detection challenge [17] and 198, 097 images from the Places dataset [22].
Dataset Splits	Yes	We use about 4, 782 people of our dataset for testing and the rest for training. We ensured that every person in an image is part of the same split, and to avoid bias, we picked images for testing such that the ﬁxation locations were uniformly distributed across the image.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	We implemented the network using Caffe [10].
Experiment Setup	Yes	The last convolutional layer of the saliency pathway has a 1 1 256 convolution kernel (i.e., K = 256). The remaining fully connected layers in the gaze pathway are of sizes 100, 400, 200, and 169 respectively. The saliency map and gaze mask are 13 13 in size (i.e., D = 13), and we use 5 shifted grids of size 5 5 each (i.e., N = 5). For learning, we augment our training data with ﬂips and random crops with the ﬁxation locations adjusted accordingly.