Learning what and where to attend

Authors: Drew Linsley, Dan Shiebler, Sven Eberhardt, Thomas Serre

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first describe a large-scale online experiment (Click Me) used to supplement Image Net with nearly half a million human-derived top-down attention maps. Using human psychophysics, we confirm that the identified top-down features from Click Me are more diagnostic than bottom-up saliency features for rapid image categorization. As a proof of concept, we extend a state-of-the-art attention network and demonstrate that adding Click Me supervision significantly improves its accuracy and yields visual features that are more interpretable and more similar to those used by human observers.
Researcher Affiliation Academia Drew Linsley, Dan Shiebler, Sven Eberhardt and Thomas Serre Department of Cognitive Linguistic & Psychological Sciences Carney Institute for Brain Science Brown University Providence, RI 02912 {drew_linsley,thomas_serre}@brown.edu
Pseudocode No The paper describes the architecture and processes in text and diagrams (e.g., Fig. 3) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes See https://github.com/serre-lab/gala_tpu for a reference implementation.
Open Datasets Yes We first describe a large-scale online experiment (Click Me) used to supplement Image Net with nearly half a million human-derived top-down attention maps. By supplementing Image Net with the public release of Click Me attention maps, we hope to spur interest in the development of network architectures that are not only more robust and accurate, but also more interpretable and consistent with human vision. The dataset can be downloaded from http://serre-lab.clps.brown.edu/resource/clickme.
Dataset Splits Yes We set aside approximately 5% of the dataset for validation (17,841 images and importance maps), another 5% for testing (17,581 images and importance maps), and the rest for training (329,036 images and importance maps).
Hardware Specification Yes This work was also made possible by Cloud TPU hardware resources that Google made available via the Tensor Flow Research Cloud (TFRC) program. Models trained on full versions of ILSVRC12 (Table 2 and Table 4) were trained with Google Cloud TPU v2 devices. Models trained on the Click Me subset of ILSVRC12 were trained with TITAN X Pascal GPUs (Table 1 in the main text and Table 3).
Software Dependencies No The paper states 'All models were implemented in Tensorflow' but does not specify a version number for Tensorflow or any other key software dependencies.
Experiment Setup Yes Models were trained for 100 epochs and weights were selected that yielded the best validation accuracy. All models were implemented in Tensorflow and were trained from scratch with weights drawn from a scaled normal distribution. We used SGD with Nesterov momentum (Sutskever et al., 2013) and a piece-wise constant learning rate schedule that decayed by 1/10 after 30, 60, 80, and 90 epochs of training. The dimensionality reduction ratio r of the shrinking operation was set to 4. This analysis demonstrated that both object categorization and Click Me map prediction improve when λ = 6.