Emergence of foveal image sampling from learning to attend in visual scenes

Authors: Brian Cheung, Eric Weiss, Bruno Olshausen

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore what is the optimal retinal sampling lattice for an (overt) attentional system performing a simple visual search task requiring the classification of an object. We propose a learnable retinal sampling lattice to explore what properties are best suited for this task. While evolutionary pressure has tuned the retinal configurations found in the primate retina, we instead utilize gradient descent optimization for our in-silico model by constructing a fully differentiable dynamically controlled model of attention. Our model as shown in Figure 3C are differentiable and trained end-to-end via backpropagation through time. Table 2 compares the classification performance of each model variant on the cluttered MNIST dataset with fixed sized digits (Dataset 1).
Researcher Affiliation Academia Brian Cheung, Eric Weiss, Bruno Olshausen Redwood Center UC Berkeley {bcheung,eaweiss,baolshausen}@berkeley.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making its source code available or provide a link to a code repository.
Open Datasets Yes Handwritten digits from the original MNIST dataset Le Cun & Cortes (1998) are randomly placed over a 100x100 image with varying amounts of distractors (clutter). In contrast to the cluttered MNIST dataset proposed in Mnih et al. (2014), the number of distractors for each image varies randomly from 0 to 20 pieces.
Dataset Splits No The paper describes the creation of 'Modified Cluttered MNIST Dataset' (Dataset 1) and a variant with variable sized digits (Dataset 2), but does not explicitly provide information on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPUs used for this research.
Software Dependencies No For stochastic gradient descent optimization we use Adam (Kingma & Ba, 2014) and construct our models in Theano (Bastien et al., 2012). The paper mentions 'Adam' and 'Theano' but does not provide specific version numbers for these software components.
Experiment Setup Yes Our recurrent network, frnn is a two layer traditional recurrent network with 512-512 units. Our control network, fcontrol is a fully-connected network with 512-3 units (x,y,zoom) in each layer. Similarly, our prediction networks are fully-connected networks with 512-10 units for predicting the class. We use Re LU non-linearities for all hidden unit layers. For stochastic gradient descent optimization we use Adam (Kingma & Ba, 2014). In our classification experiments, the model is given T = 4 glimpses.