Progressive Boundary Refinement Network for Temporal Action Detection
Authors: Qinying Liu, Zilei Wang11612-11619
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally evaluate the proposed PBRNet and comprehensively investigate the effect of the main components. The results show PBRNet achieves the state-of-the-art detection performances on two popular benchmarks: THUMOS 14 and Activity Net, and meanwhile possesses a high inference speed. |
| Researcher Affiliation | Academia | Qinying Liu, Zilei Wang Department of Automation, University of Science and Technology of China lydyc@mail.ustc.edu.cn, zlwang@ustc.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1, Figure 2, Figure 3) and descriptive text for its components, but it does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Dataset THUMOS 14 (Jiang et al. 2014) contains 200 untrimmed videos (including 3, 007 action instances) in the validation set and 213 untrimmed videos (including 3, 358 action instances) in the test set which are widely used for temporal action detection from 20 action categories. Activity Net v1.3 (Caba Heilbron et al. 2015) contains 10, 024, 4, 926, and 5, 044 videos from 200 classes in the training, validation, and test sets, respectively. |
| Dataset Splits | Yes | Dataset THUMOS 14 (Jiang et al. 2014) contains 200 untrimmed videos (including 3, 007 action instances) in the validation set and 213 untrimmed videos (including 3, 358 action instances) in the test set which are widely used for temporal action detection from 20 action categories. We use the validation set for training and the test set for evaluation. Activity Net v1.3 (Caba Heilbron et al. 2015) contains 10, 024, 4, 926, and 5, 044 videos from 200 classes in the training, validation, and test sets, respectively. We evaluate our model on the validation set, as in previous works (Shou et al. 2017; Gao, Chen, and Nevatia 2018; Xie et al. 2018). |
| Hardware Specification | Yes | Here our model is evaluated on a Nvidia Ge Fore GTX 1080Ti GPU. |
| Software Dependencies | No | The paper mentions using I3D as a visual encoder and that the backbone is pretrained, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions, etc.). |
| Experiment Setup | Yes | In our experiments, H = W = 96 is set. On THUMOS 14, we sample both RGB and optic flow frames at 10 frames per second (fps). The length of each clip L is set as 256 frames (i.e., about 25.6 seconds). ... on Activity Net, we sample frames at only 3 fps on Activity Net. Accordingly, L is set as 768 (i.e., covering 256 seconds of a video). ... We set 6 feature layers in the two pyramids. ... The batch size is set as 1, thus we freeze all batch normalization layers. ... In our implementation, hcp = 0.5, hrp = 0.6 and hfg = 0.7 are used. The weight factor γ is empirically set as 1. ... η is empirically set as 10. |