Explicitly Modeled Attention Maps for Image Classification
Authors: Andong Tan, Duc Tam Nguyen, Maximilian Dax, Matthias Nießner, Thomas Brox9799-9807
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation shows that our method achieves an accuracy improvement of up to 2.2% over the Res Net-baselines in Image Net ILSVRC and outperforms other self-attention methods such as AA-Res Net152 in accuracy by 0.9% with 6.4% fewer parameters and 6.7% fewer GFLOPs. This result empirically indicates the value of incorporating geometric prior into self-attention mechanism when applied in image classification. |
| Researcher Affiliation | Collaboration | 1 Technical University of Munich 2 University of Freiburg 3 University of Bonn 4 Robert Bosch Gmb H |
| Pseudocode | No | The paper presents mathematical equations (Eq. 1-6) for its method but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using PyTorch baselines for its experiments but does not provide any explicit statement or link to its own open-source code for the methodology described. |
| Open Datasets | Yes | small scale and large scale image classification datasets including CIFAR10, CIFAR100 (Krizhevsky 2009), Tiny Image Net (Yao and Miller 2015) and Image Net (Deng et al. 2009). |
| Dataset Splits | Yes | All experiments (including AA-Net and Exp Att-Net) are based on the respective baselines from Py Torch (Paszke et al. 2019), use synchronous SGD with momentum 0.9, and cosine learning rate with restarts (Loshchilov and Hutter 2016) for in total 450 epochs, 164 epochs, and 324 epochs in CIFAR, Image Net, and Tiny Image Net experiments respectively. Concretely, in the first 15 epochs, learning rate is linearly increased to 0.05, than a cosine learning rate with restarts at 25,45,85,165,325 epochs is applied where appliable. Additionally, CIFAR experiments use learning rate 0.0002 between epoch 325 and 450. Batch size of all experiments are chosen to fit the GPU memory. The radius σ of Gaussian kernel is initialized to 0.75. |
| Hardware Specification | No | The paper mentions that “Batch size of all experiments are chosen to fit the GPU memory” but does not provide specific details about the GPU model, CPU, or any other hardware used for the experiments. |
| Software Dependencies | No | The paper mentions “Py Torch (Paszke et al. 2019)” but does not provide a specific version number for it or for any other software libraries or dependencies. |
| Experiment Setup | Yes | Models are trained from scratch. All experiments (including AA-Net and Exp Att-Net) are based on the respective baselines from Py Torch (Paszke et al. 2019), use synchronous SGD with momentum 0.9, and cosine learning rate with restarts (Loshchilov and Hutter 2016) for in total 450 epochs, 164 epochs, and 324 epochs in CIFAR, Image Net, and Tiny Image Net experiments respectively. Concretely, in the first 15 epochs, learning rate is linearly increased to 0.05, than a cosine learning rate with restarts at 25,45,85,165,325 epochs is applied where appliable. Additionally, CIFAR experiments use learning rate 0.0002 between epoch 325 and 450. Batch size of all experiments are chosen to fit the GPU memory. The radius σ of Gaussian kernel is initialized to 0.75. |