A^2-Nets: Double Attention Networks
Authors: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance. |
| Researcher Affiliation | Collaboration | Yunpeng Chen National University of Singapore chenyunpeng@u.nus.edu Yannis Kalantidis Facebook Research yannisk@fb.com Jianshu Li National University of Singapore jianshu@u.nus.edu Shuicheng Yan Qihoo 360 AI Institute National University of Singapore eleyans@nus.edu.sg Jiashi Feng National University of Singapore elefjia@nus.edu.sg |
| Pseudocode | No | The paper provides a computational graph in Figure 2, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | Code and trained models will be released on Git Hub soon. |
| Open Datasets | Yes | Kinetics [12] video recognition dataset, Image Net-1k [13] image classification dataset, UCF-101 [20] |
| Dataset Splits | Yes | For image classification, we report standard single model single 224 224 center crop validation accuracy, following [9, 10]. The UCF-101 contains about 13, 320 videos from 101 action categories and has three train/test splits. |
| Hardware Specification | Yes | All experiments are conducted using a distributed K80 GPU cluster |
| Software Dependencies | No | We use MXNet [3] to experiment on the image classification task, and Py Torch [18] on video classification tasks. The paper mentions the names of the software used but does not specify their version numbers. |
| Experiment Setup | Yes | The base learning rate is set to 0.2 and is reduced with a factor of 0.1 at the 20k-th, 30k-th iterations, and terminated at the 37k-th iteration. We use 32 GPUs per experiment with a total batch size of 512 training from scratch. The base learning rate is set to 0.1 and decreases with a factor of 0.1 when training accuracy is saturated. The network takes 8 frames (sampling stride: 8) as input and is trained for 32k iterations with a total batch size of 512 using 64 GPUs. The initial learning rate is set to 0.04 and decreased in a stepwise manner when training accuracy is saturated. |