reproducibilityindex.ai

Hard-Attention for Scalable Image Classification

Authors: Athanasios Papadopoulos, Pawel Korus, Nasir Memon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our model against hard-attention baselines on Image Net, achieving higher accuracy with less resources (FLOPs, processing time and memory). We further test our model on f Mo W dataset, where we process satellite images of size up to 896 896 px, getting up to 2.5x faster processing compared to baselines operating on the same resolution, while achieving higher accuracy as well.
Researcher Affiliation	Academia	1Tandon School of Engineering, New York University 2AGH University of Science and Technology
Pseudocode	No	The paper describes the architecture and learning rule in text and equations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states that TNet and Bag Net-77 are implemented in TF 2, but it does not provide a specific link or explicit statement about the release of their own source code.
Open Datasets	Yes	Image Net [13] consists of natural images from 1, 000 classes. We use the ILSVRC 2012 version, which consists of 1, 281, 167 training and 50, 000 validation images.
Dataset Splits	Yes	We use the ILSVRC 2012 version, which consists of 1, 281, 167 training and 50, 000 validation images." and "They are split in 363, 572 training, 53, 041 validation and 53, 473 testing images.
Hardware Specification	Yes	We use a single NVIDIA Quadro RTX 8000 GPU, with 64 GB of RAM, and 20 CPUs to mitigate data pipeline impact.
Software Dependencies	Yes	For Saccader and DRAM we use public implementations [21] in Tensor Flow (TF) 1. TNet and Bag Net-77 are implemented in TF 2.
Experiment Setup	Yes	We train TNet with 2 processing levels on images of 224 224 px using class labels only. We train for 200 epochs using the Adam optimizer [35] with initial learning rate 10 4, that we drop once by a factor of 0.1. We use dropout (keep probability 0.5) in the last layer of feature extraction. We use per-feature regularization with λc = λr = 0.3. We attend to a ﬁxed number of 3 locations.