reproducibilityindex.ai

A^2-Nets: Double Attention Networks

Authors: Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive ablation studies and experiments on both image and video recognition tasks for evaluating its performance.
Researcher Affiliation	Collaboration	Yunpeng Chen National University of Singapore chenyunpeng@u.nus.edu Yannis Kalantidis Facebook Research yannisk@fb.com Jianshu Li National University of Singapore jianshu@u.nus.edu Shuicheng Yan Qihoo 360 AI Institute National University of Singapore eleyans@nus.edu.sg Jiashi Feng National University of Singapore elefjia@nus.edu.sg
Pseudocode	No	The paper provides a computational graph in Figure 2, but it does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	Code and trained models will be released on Git Hub soon.
Open Datasets	Yes	Kinetics [12] video recognition dataset, Image Net-1k [13] image classiﬁcation dataset, UCF-101 [20]
Dataset Splits	Yes	For image classiﬁcation, we report standard single model single 224 224 center crop validation accuracy, following [9, 10]. The UCF-101 contains about 13, 320 videos from 101 action categories and has three train/test splits.
Hardware Specification	Yes	All experiments are conducted using a distributed K80 GPU cluster
Software Dependencies	No	We use MXNet [3] to experiment on the image classiﬁcation task, and Py Torch [18] on video classiﬁcation tasks. The paper mentions the names of the software used but does not specify their version numbers.
Experiment Setup	Yes	The base learning rate is set to 0.2 and is reduced with a factor of 0.1 at the 20k-th, 30k-th iterations, and terminated at the 37k-th iteration. We use 32 GPUs per experiment with a total batch size of 512 training from scratch. The base learning rate is set to 0.1 and decreases with a factor of 0.1 when training accuracy is saturated. The network takes 8 frames (sampling stride: 8) as input and is trained for 32k iterations with a total batch size of 512 using 64 GPUs. The initial learning rate is set to 0.04 and decreased in a stepwise manner when training accuracy is saturated.