Graph Attention Based Proposal 3D ConvNets for Action Detection
Authors: Jin Li, Xianglong Liu, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song4626-4633
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two proposal 3D Conv Nets based models (P-C3D and P-Res Net) and two popular action detection benchmarks (THUMOS 2014, Activity Net v1.3) demonstrate the state-of-the-art performance achieved by our method. Particularly, P-C3D embedded with our module achieves average m AP 3.7% improvement on THUMOS 2014 dataset compared to original model. Comparison with State-of-the-art Methods Ablation Study |
| Researcher Affiliation | Academia | Jun Li,1 Xianglong Liu,1,2 Zhuofan Zong,1 Wanru Zhao,1 Mingyuan Zhang,1 Jingkuan Song3 1State Key Lab of Software Development Environment, Beihang University, Beijing, China 2Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China 3Innovation Center, University of Electronic Science and Technology of China, Chengdu, China |
| Pseudocode | No | The paper describes methods and processes in narrative text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references a GitHub link (1https://github.com/sunnyxiaohu/R-C3D.pytorch) in a footnote, stating: "We implement our graph attention module with framewise constraint mainly on R-C3D1 model, which is written in pytorch." However, this link points to the baseline R-C3D model and not the authors' specific modifications or proposed AGCN module, nor does it explicitly state their code is open-source. |
| Open Datasets | Yes | THUMOS 2014 (Jiang et al. 2014) Activity Net v1.3 (Fabian Caba Heilbron and Niebles 2015) |
| Dataset Splits | Yes | THUMOS 2014... It includes 2765 trimmed videos of these 20 actions in UCF101 for training, 200 and 213 untrimmed videos with temporal annotations for the validation and the test sets respectively. Activity Net v1.3... It is divided into training, validation and test sets with ratio 2:1:1. It has 10024, 4926 and 5044 videos for training, validation and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for running the experiments. |
| Software Dependencies | No | The paper mentions "pytorch" as the framework used but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For both models and datasets, we decompress the videos into frames at 25 frames per second (fps), and create a buffer of 768 frames. For THUMOS 2014, the learning rate is kept fixed at 10 4 for first 3 epochs and is decreased to 10 5 for the last 2 epochs. We choose 10 anchor segments with specific scale values [2, 4, 5, 6, 8, 9, 10, 12, 14, 16]. We use Sports-1M pretrained model to initialize the training. For Activity Net v1.3, the learning rate is still kept fixed at 10 4 for first 6 epochs and is decreased to 10 5 for the last 2 epochs on Activity Net v1.3. We choose 37 anchor segments with specific scale values [1, 1.25, 1.5, 1.75, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 18, 20, 22, 24, 28, 32, 36, 40, 44, 52, 60, 68, 76, 84, 92, 100]. The learning rate of our module is 10 times larger than basic model for both datasets. For THUMOS 2014, the learning rate is kept fixed at 10 4 for first 4 epochs and is decreased to 10 5 for the last 2 epochs. We use UCF-101 pretrained model to initialize the training. |