HR-Pro: Point-Supervised Temporal Action Localization via Hierarchical Reliability Propagation

Authors: Huaxin Zhang, Xiang Wang, Xiaohao Xu, Zhiwu Qing, Changxin Gao, Nong Sang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our HRPro achieves state-of-the-art performance on multiple challenging benchmarks, including an impressive average m AP of 60.3% on THUMOS14. Notably, our HR-Pro largely surpasses all previous point-supervised methods, and even outperforms several competitive fully-supervised methods. Code will be available at https://github.com/pipixin321/HR-Pro.
Researcher Affiliation Academia 1 Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology 2 University of Michigan, Ann Arbor
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code will be available at https://github.com/pipixin321/HR-Pro.
Open Datasets Yes Datasets. We conduct our experiments on four popular action localization datasets, with only point-level annotations used for training. In our experiments, we utilize the pointlevel annotations provided in (Lee and Byun 2021) for fair comparison. (1) THUMOS14 (Idrees et al. 2017) provides 413 untrimmed sports videos for 20 action categories, including 200 videos for training and 213 videos for testing... (2) GTEA (Fathi, Ren, and Rehg 2011) provides 28 videos of 7 fine-grained daily activities in a kitchen. (3) BEOID (Damen et al. 2014) provides 58 video samples with 30 action classes with an average duration of 60s. (4) Activity Net 1.3 (Caba Heilbron et al. 2015) provides 10,024 training, 4,926 validation, and 5,044 test videos with 200 action classes.
Dataset Splits Yes (1) THUMOS14 (Idrees et al. 2017) provides 413 untrimmed sports videos for 20 action categories, including 200 videos for training and 213 videos for testing... (4) Activity Net 1.3 (Caba Heilbron et al. 2015) provides 10,024 training, 4,926 validation, and 5,044 test videos with 200 action classes.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'I3D network', but does not provide specific software names with version numbers for libraries or frameworks (e.g., Python version, PyTorch version).
Experiment Setup Yes Implementation Details. For a fair comparison, we follow existing method (Lee and Byun 2021) to divide each video into 16-frame snippets and use two-stream I3D network pretrained on Kinetics-400 (Carreira and Zisserman 2017) as the feature extractor. For THUMOS14, we use the Adam optimizer with a learning rate of 1e-4 and a weight decay of 1e-3, and the batch size is set to 16. The hyper-parameters are set by grid search: τ = 0.1, µ = 0.999, λ1 = λ2 = 1. The video-level threshold is set to 0.5, the θP spans from 0 to 0.25 with a step size of 0.05, the θA spans from 0 to 0.1 with a step size of 0.01. The number of RAB is set to 2.