TDAF: Top-Down Attention Framework for Vision Tasks

Authors: Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu2384-2392

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on several tasks to evaluate our proposed Top-Down Attention Framework. Results reveal that: 1) The ANAR can generate effective attention maps with top-down characteristics. 2) The R2DNS is easy to train in the end-to-end setting. 3) The Top-Down Attention Framework can enjoy accuracy gained from the mixed top-down and bottom-up features, greatly surpassing the corresponding baselines and other attention methods. We evaluate our framework on several visual tasks: image classification on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), action recognition on Kinetics (Kay et al. 2017), objection detection, and human pose estimation on COCO (Lin et al. 2014).
Researcher Affiliation Academia Bo Pang1, Yizhuo Li1, Jiefeng Li1, Muchen Li2, Hanwen Cao1, Cewu Lu 1 1 Shanghai Jiao Tong University 2 Huazhong University of Science and Technology {pangbo, liyizhuo, ljf likit, mbd chw, lucewu}@sjtu.edu.cn, muchenli1997@gmail.com
Pseudocode No The paper contains structural diagrams and formal equations, but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code is publicly available.
Open Datasets Yes We evaluate our framework on several visual tasks: image classification on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), action recognition on Kinetics (Kay et al. 2017), objection detection, and human pose estimation on COCO (Lin et al. 2014).
Dataset Splits Yes Image Net and Analysis We further analyze different attention settings on Image Net-2012 (Russakovsky et al. 2015). Training images are resized randomly to [256, 480] with its shorter side and a 224 224 crop is sampled from it or its horizontal flip. [...] Performance on Image Net validation set (10-crop test). [...] Pose Estimation We further evaluate TDAF on pose estimation task using the COCO-2017 dataset. [...] The results are shown in Tab. 5. [...] Table 5: Performance (AP in %) of the Simple Pose (Xiao, Wu, and Wei 2018) on COCO-2017 validation set.
Hardware Specification Yes On Titan XP with 256 batch size, the forward time of Res50 and its attention version is 0.015s and 0.018s, and 0.031s vs 0.035s for Res101.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use SGD optimizer with 256 mini-batch on 8 GPUs to train. The parameters are initialized by Kaiming initialization proposed in (He et al. 2015). We modify the backbone of FCOS with our TDAF. The data pre-processing and training methods follow the original FCOS ( 2 training schedule with multi-scale input).