Fast Learning of Temporal Action Proposal via Dense Boundary Generator

Authors: Chuming Lin, Jian Li, Yabiao Wang, Ying Tai, Donghao Luo, Zhipeng Cui, Chengjie Wang, Jilin Li, Feiyue Huang, Rongrong Ji11499-11506

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on popular benchmarks Activity Net-1.3 and THUMOS14 demonstrate the superiority of DBG over the state-of-the-art proposal generator (e.g., MGG and BMN).
Researcher Affiliation Collaboration Chuming Lin, 1 Jian Li, 1 Yabiao Wang,1 Ying Tai,1 Donghao Luo,1 Zhipeng Cui,1 Chengjie Wang,1 Jilin Li,1 Feiyue Huang,1 Rongrong Ji21Youtu Lab, Tencent, 2Xiamen University, China {chuminglin, swordli, caseywang, yingtai, michaelluo, zhipengcui, jasoncjwang, jerolinli, garyhuang}@tencent.com rrji@xmu.edu.cn
Pseudocode No The paper describes its method through textual descriptions, architectural diagrams (Figure 1, 3, 4), tables (Table 1), and mathematical formulas, but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes https://github.com/TencentYoutuResearch/ActionDetection-DBG
Open Datasets Yes Extensive experiments on popular benchmarks Activity Net-1.3 and THUMOS14 demonstrate the superiority of DBG over the state-of-the-art proposal generator (e.g., MGG and BMN). Activity Net-1.3. It is a large-scale dataset containing 19,994 videos with 200 activity classes for action recognition, temporal proposal generation and detection. The quantity ratio of training, validation and testing sets satisfies 2:1:1. THUMOS14. This dataset has 1,010 validation videos and 1,574 testing videos with 20 classes. For the action proposal or detection task, there are 200 validation videos and 212 testing videos labeled with temporal annotations.
Dataset Splits Yes Activity Net-1.3. The quantity ratio of training, validation and testing sets satisfies 2:1:1. THUMOS14. This dataset has 1,010 validation videos and 1,574 testing videos with 20 classes. For the action proposal or detection task, there are 200 validation videos and 212 testing videos labeled with temporal annotations. We train our model on the validation set and evaluate on the test set.
Hardware Specification Yes for a 3-minute video processed on Nvidia GTX 1080Ti, our inference speed accelerates a lot.
Software Dependencies No The paper mentions using "Adam for optimization" and refers to using "two-stream network" and "C3D" features, but it does not specify version numbers for any software, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes For Activity Net-1.3, we resize video feature sequence by linear interpolation and set L = 100. For THUMOS14, we slide the window on video feature sequence with overlap = 0.5 and L = 128. When training DBG, we use Adam for optimization. The batch size is set to 16. The learning rate is set to 10 3 for the first 10 epochs, and we decay it to 10 4 for another 2 epochs. For Soft-NMS, we set the threshold 0.8 on the Activity Net-1.3 and 0.65 on the THUMOS14. ϵ in Gaussian function is set to 0.75 on both temporal proposal generation datasets.