Saccader: Improving Accuracy of Hard Attention Models for Vision

Authors: Gamaleldin Elsayed, Simon Kornblith, Quoc V. Le

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our best models narrow the gap to common Image Net baselines, achieving 75% top-1 and 91% top-5 while attending to less than one-third of the image. Our results show that the Saccader model is highly accurate compared to other visual attention models while remaining interpretable (Figure 1).
Researcher Affiliation Industry Gamaleldin F. Elsayed Google Research, Brain Team gamaleldin@google.com Simon Kornblith Google Research, Brain Team Quoc V. Le Google Research, Brain Team
Pseudocode No The paper describes the model architecture and training procedure in text and with equations, but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Model code available at github.com/google-research/google-research/tree/master/saccader.
Open Datasets Yes In all our training, we divide the standard Image Net ILSVRC 2012 training set into training and development subsets.
Dataset Splits Yes In all our training, we divide the standard Image Net ILSVRC 2012 training set into training and development subsets. We trained our model on the training subset and chose our hyperparameters based on the development subset. We follow common practice and report results on the separate ILSVRC 2012 validation set, which we do not use for training or hyperparameter selection.
Hardware Specification No The paper does not provide specific details regarding the hardware used for experiments, such as GPU or CPU models. It only generally mentions 'computational resources'.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes In each of the above steps, we trained our model for 120 epochs using Nesterov momentum of 0.9. (See Appendix for training hyperparameters).