reproducibilityindex.ai

Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature

Authors: Wulian Yun, Mengshi Qi, Chuanming Wang, Huadong Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two publicly available datasets, i.e., THUMOS14 and Activity Net v1.3, demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.
Researcher Affiliation	Academia	Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, China {yunwl,qms,wcm,mhd}@bupt.edu.cn
Pseudocode	No	The paper describes the method in prose but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/wuli55555/ISSF.
Open Datasets	Yes	We conduct our experiments on the two commonly-used benchmark datasets, including THUMOS14 (Jiang et al. 2014) and Acitivity Net v1.3 (Heilbron et al. 2015).
Dataset Splits	Yes	For THUMOS14 dataset, we use the validation videos to train our model and test videos for evaluation. Activity Net v1.3 contains 10,024 training videos, 4,926 validation videos, and 5,044 testing videos of 200 action categories. Following (Lee et al. 2021; Huang, Wang, and Li 2022), we use the training videos to train our model and validation videos for evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch framework' and 'Adam optimizer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The scaling factor r is set to 4. The hyper-parameter θ and λ are set to 0.2 and 0.1, respectively. For THUMOS14 dataset, we train 180 epochs with a learning rate of 0.00005, the batch size is set to 10, σ is set to 0.88, and K is set to 50% T , where T is the number of video snippets. For Activity Net v1.3 dataset, we train 100 epochs with a learning rate of 0.0001, the batch size is set to 32, σ is set to 0.9, and K is set to 90% T .