Learning what and where to attend
Authors: Drew Linsley, Dan Shiebler, Sven Eberhardt, Thomas Serre
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first describe a large-scale online experiment (Click Me) used to supplement Image Net with nearly half a million human-derived top-down attention maps. Using human psychophysics, we confirm that the identified top-down features from Click Me are more diagnostic than bottom-up saliency features for rapid image categorization. As a proof of concept, we extend a state-of-the-art attention network and demonstrate that adding Click Me supervision significantly improves its accuracy and yields visual features that are more interpretable and more similar to those used by human observers. |
| Researcher Affiliation | Academia | Drew Linsley, Dan Shiebler, Sven Eberhardt and Thomas Serre Department of Cognitive Linguistic & Psychological Sciences Carney Institute for Brain Science Brown University Providence, RI 02912 {drew_linsley,thomas_serre}@brown.edu |
| Pseudocode | No | The paper describes the architecture and processes in text and diagrams (e.g., Fig. 3) but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | See https://github.com/serre-lab/gala_tpu for a reference implementation. |
| Open Datasets | Yes | We first describe a large-scale online experiment (Click Me) used to supplement Image Net with nearly half a million human-derived top-down attention maps. By supplementing Image Net with the public release of Click Me attention maps, we hope to spur interest in the development of network architectures that are not only more robust and accurate, but also more interpretable and consistent with human vision. The dataset can be downloaded from http://serre-lab.clps.brown.edu/resource/clickme. |
| Dataset Splits | Yes | We set aside approximately 5% of the dataset for validation (17,841 images and importance maps), another 5% for testing (17,581 images and importance maps), and the rest for training (329,036 images and importance maps). |
| Hardware Specification | Yes | This work was also made possible by Cloud TPU hardware resources that Google made available via the Tensor Flow Research Cloud (TFRC) program. Models trained on full versions of ILSVRC12 (Table 2 and Table 4) were trained with Google Cloud TPU v2 devices. Models trained on the Click Me subset of ILSVRC12 were trained with TITAN X Pascal GPUs (Table 1 in the main text and Table 3). |
| Software Dependencies | No | The paper states 'All models were implemented in Tensorflow' but does not specify a version number for Tensorflow or any other key software dependencies. |
| Experiment Setup | Yes | Models were trained for 100 epochs and weights were selected that yielded the best validation accuracy. All models were implemented in Tensorflow and were trained from scratch with weights drawn from a scaled normal distribution. We used SGD with Nesterov momentum (Sutskever et al., 2013) and a piece-wise constant learning rate schedule that decayed by 1/10 after 30, 60, 80, and 90 epochs of training. The dimensionality reduction ratio r of the shrinking operation was set to 4. This analysis demonstrated that both object categorization and Click Me map prediction improve when λ = 6. |