Emergence of foveal image sampling from learning to attend in visual scenes
Authors: Brian Cheung, Eric Weiss, Bruno Olshausen
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore what is the optimal retinal sampling lattice for an (overt) attentional system performing a simple visual search task requiring the classification of an object. We propose a learnable retinal sampling lattice to explore what properties are best suited for this task. While evolutionary pressure has tuned the retinal configurations found in the primate retina, we instead utilize gradient descent optimization for our in-silico model by constructing a fully differentiable dynamically controlled model of attention. Our model as shown in Figure 3C are differentiable and trained end-to-end via backpropagation through time. Table 2 compares the classification performance of each model variant on the cluttered MNIST dataset with fixed sized digits (Dataset 1). |
| Researcher Affiliation | Academia | Brian Cheung, Eric Weiss, Bruno Olshausen Redwood Center UC Berkeley {bcheung,eaweiss,baolshausen}@berkeley.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about making its source code available or provide a link to a code repository. |
| Open Datasets | Yes | Handwritten digits from the original MNIST dataset Le Cun & Cortes (1998) are randomly placed over a 100x100 image with varying amounts of distractors (clutter). In contrast to the cluttered MNIST dataset proposed in Mnih et al. (2014), the number of distractors for each image varies randomly from 0 to 20 pieces. |
| Dataset Splits | No | The paper describes the creation of 'Modified Cluttered MNIST Dataset' (Dataset 1) and a variant with variable sized digits (Dataset 2), but does not explicitly provide information on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPUs used for this research. |
| Software Dependencies | No | For stochastic gradient descent optimization we use Adam (Kingma & Ba, 2014) and construct our models in Theano (Bastien et al., 2012). The paper mentions 'Adam' and 'Theano' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Our recurrent network, frnn is a two layer traditional recurrent network with 512-512 units. Our control network, fcontrol is a fully-connected network with 512-3 units (x,y,zoom) in each layer. Similarly, our prediction networks are fully-connected networks with 512-10 units for predicting the class. We use Re LU non-linearities for all hidden unit layers. For stochastic gradient descent optimization we use Adam (Kingma & Ba, 2014). In our classification experiments, the model is given T = 4 glimpses. |