Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature
Authors: Wulian Yun, Mengshi Qi, Chuanming Wang, Huadong Ma
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two publicly available datasets, i.e., THUMOS14 and Activity Net v1.3, demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods. |
| Researcher Affiliation | Academia | Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, China {yunwl,qms,wcm,mhd}@bupt.edu.cn |
| Pseudocode | No | The paper describes the method in prose but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at https://github.com/wuli55555/ISSF. |
| Open Datasets | Yes | We conduct our experiments on the two commonly-used benchmark datasets, including THUMOS14 (Jiang et al. 2014) and Acitivity Net v1.3 (Heilbron et al. 2015). |
| Dataset Splits | Yes | For THUMOS14 dataset, we use the validation videos to train our model and test videos for evaluation. Activity Net v1.3 contains 10,024 training videos, 4,926 validation videos, and 5,044 testing videos of 200 action categories. Following (Lee et al. 2021; Huang, Wang, and Li 2022), we use the training videos to train our model and validation videos for evaluation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch framework' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The scaling factor r is set to 4. The hyper-parameter θ and λ are set to 0.2 and 0.1, respectively. For THUMOS14 dataset, we train 180 epochs with a learning rate of 0.00005, the batch size is set to 10, σ is set to 0.88, and K is set to 50% T , where T is the number of video snippets. For Activity Net v1.3 dataset, we train 100 epochs with a learning rate of 0.0001, the batch size is set to 32, σ is set to 0.9, and K is set to 90% T . |