TDAF: Top-Down Attention Framework for Vision Tasks
Authors: Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu2384-2392
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on several tasks to evaluate our proposed Top-Down Attention Framework. Results reveal that: 1) The ANAR can generate effective attention maps with top-down characteristics. 2) The R2DNS is easy to train in the end-to-end setting. 3) The Top-Down Attention Framework can enjoy accuracy gained from the mixed top-down and bottom-up features, greatly surpassing the corresponding baselines and other attention methods. We evaluate our framework on several visual tasks: image classification on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), action recognition on Kinetics (Kay et al. 2017), objection detection, and human pose estimation on COCO (Lin et al. 2014). |
| Researcher Affiliation | Academia | Bo Pang1, Yizhuo Li1, Jiefeng Li1, Muchen Li2, Hanwen Cao1, Cewu Lu 1 1 Shanghai Jiao Tong University 2 Huazhong University of Science and Technology {pangbo, liyizhuo, ljf likit, mbd chw, lucewu}@sjtu.edu.cn, muchenli1997@gmail.com |
| Pseudocode | No | The paper contains structural diagrams and formal equations, but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | We evaluate our framework on several visual tasks: image classification on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Image Net (Russakovsky et al. 2015), action recognition on Kinetics (Kay et al. 2017), objection detection, and human pose estimation on COCO (Lin et al. 2014). |
| Dataset Splits | Yes | Image Net and Analysis We further analyze different attention settings on Image Net-2012 (Russakovsky et al. 2015). Training images are resized randomly to [256, 480] with its shorter side and a 224 224 crop is sampled from it or its horizontal flip. [...] Performance on Image Net validation set (10-crop test). [...] Pose Estimation We further evaluate TDAF on pose estimation task using the COCO-2017 dataset. [...] The results are shown in Tab. 5. [...] Table 5: Performance (AP in %) of the Simple Pose (Xiao, Wu, and Wei 2018) on COCO-2017 validation set. |
| Hardware Specification | Yes | On Titan XP with 256 batch size, the forward time of Res50 and its attention version is 0.015s and 0.018s, and 0.031s vs 0.035s for Res101. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use SGD optimizer with 256 mini-batch on 8 GPUs to train. The parameters are initialized by Kaiming initialization proposed in (He et al. 2015). We modify the backbone of FCOS with our TDAF. The data pre-processing and training methods follow the original FCOS ( 2 training schedule with multi-scale input). |