Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature

Authors: Wulian Yun, Mengshi Qi, Chuanming Wang, Huadong Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two publicly available datasets, i.e., THUMOS14 and Activity Net v1.3, demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.
Researcher Affiliation Academia Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, China {yunwl,qms,wcm,mhd}@bupt.edu.cn
Pseudocode No The paper describes the method in prose but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/wuli55555/ISSF.
Open Datasets Yes We conduct our experiments on the two commonly-used benchmark datasets, including THUMOS14 (Jiang et al. 2014) and Acitivity Net v1.3 (Heilbron et al. 2015).
Dataset Splits Yes For THUMOS14 dataset, we use the validation videos to train our model and test videos for evaluation. Activity Net v1.3 contains 10,024 training videos, 4,926 validation videos, and 5,044 testing videos of 200 action categories. Following (Lee et al. 2021; Huang, Wang, and Li 2022), we use the training videos to train our model and validation videos for evaluation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions 'Py Torch framework' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The scaling factor r is set to 4. The hyper-parameter θ and λ are set to 0.2 and 0.1, respectively. For THUMOS14 dataset, we train 180 epochs with a learning rate of 0.00005, the batch size is set to 10, σ is set to 0.88, and K is set to 50% T , where T is the number of video snippets. For Activity Net v1.3 dataset, we train 100 epochs with a learning rate of 0.0001, the batch size is set to 32, σ is set to 0.9, and K is set to 90% T .