reproducibilityindex.ai

Explicitly Modeled Attention Maps for Image Classification

Authors: Andong Tan, Duc Tam Nguyen, Maximilian Dax, Matthias Nießner, Thomas Brox9799-9807

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation shows that our method achieves an accuracy improvement of up to 2.2% over the Res Net-baselines in Image Net ILSVRC and outperforms other self-attention methods such as AA-Res Net152 in accuracy by 0.9% with 6.4% fewer parameters and 6.7% fewer GFLOPs. This result empirically indicates the value of incorporating geometric prior into self-attention mechanism when applied in image classiﬁcation.
Researcher Affiliation	Collaboration	1 Technical University of Munich 2 University of Freiburg 3 University of Bonn 4 Robert Bosch Gmb H
Pseudocode	No	The paper presents mathematical equations (Eq. 1-6) for its method but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using PyTorch baselines for its experiments but does not provide any explicit statement or link to its own open-source code for the methodology described.
Open Datasets	Yes	small scale and large scale image classiﬁcation datasets including CIFAR10, CIFAR100 (Krizhevsky 2009), Tiny Image Net (Yao and Miller 2015) and Image Net (Deng et al. 2009).
Dataset Splits	Yes	All experiments (including AA-Net and Exp Att-Net) are based on the respective baselines from Py Torch (Paszke et al. 2019), use synchronous SGD with momentum 0.9, and cosine learning rate with restarts (Loshchilov and Hutter 2016) for in total 450 epochs, 164 epochs, and 324 epochs in CIFAR, Image Net, and Tiny Image Net experiments respectively. Concretely, in the ﬁrst 15 epochs, learning rate is linearly increased to 0.05, than a cosine learning rate with restarts at 25,45,85,165,325 epochs is applied where appliable. Additionally, CIFAR experiments use learning rate 0.0002 between epoch 325 and 450. Batch size of all experiments are chosen to ﬁt the GPU memory. The radius σ of Gaussian kernel is initialized to 0.75.
Hardware Specification	No	The paper mentions that “Batch size of all experiments are chosen to ﬁt the GPU memory” but does not provide specific details about the GPU model, CPU, or any other hardware used for the experiments.
Software Dependencies	No	The paper mentions “Py Torch (Paszke et al. 2019)” but does not provide a specific version number for it or for any other software libraries or dependencies.
Experiment Setup	Yes	Models are trained from scratch. All experiments (including AA-Net and Exp Att-Net) are based on the respective baselines from Py Torch (Paszke et al. 2019), use synchronous SGD with momentum 0.9, and cosine learning rate with restarts (Loshchilov and Hutter 2016) for in total 450 epochs, 164 epochs, and 324 epochs in CIFAR, Image Net, and Tiny Image Net experiments respectively. Concretely, in the ﬁrst 15 epochs, learning rate is linearly increased to 0.05, than a cosine learning rate with restarts at 25,45,85,165,325 epochs is applied where appliable. Additionally, CIFAR experiments use learning rate 0.0002 between epoch 325 and 450. Batch size of all experiments are chosen to ﬁt the GPU memory. The radius σ of Gaussian kernel is initialized to 0.75.