reproducibilityindex.ai

Self-Supervised Video Action Localization with Adversarial Temporal Transforms

Authors: Guoqiang Gong, Liangfeng Zheng, Wenhao Jiang, Yadong Mu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on THUMOS14 and Activity Net demonstrate that our model consistently outperforms the state-of-the-art weakly-supervised temporal action localization methods.
Researcher Affiliation	Collaboration	1Wangxuan Institute of Computer Technology, Peking University 2Tencent AI Lab {gonggq, zhengliangfeng, myd}@pku.edu.cn, cswhjiang@gmail.com
Pseudocode	No	The paper describes algorithmic steps in prose but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about the release of source code or links to a code repository.
Open Datasets	Yes	To evaluate our method, we conduct experiments on two video benchmarks: THUMOS14 [Idrees et al., 2017] and Activity Net [Heilbron et al., 2015].
Dataset Splits	Yes	Activity Net-1.3... This dataset is divided into training, validation and testing sets with a ratio of 2:1:1. ...we use the training set to train our model and evaluate on the validation set as in previous work [Paul et al., 2018; Narayan et al., 2019].
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	The action localization model is trained with batch size 24 and optimized by Adam. The learning rate of localization model is 0.001 on Activity Net and 0.0001 on THUMOS14. The policy network is optimized by Adam with 0.0001 learning rate on Activity Net and 0.00001 learning rate on THUMOS14. α in Eqn. 1 is 0.5. For action localization, classes whose video-level probabilities below 0.1 are ﬁltered out. For the remaining class c, a set of threshold values ranging from [0.1 : 1.0 : 0.1] mean (S[:, c]) is used to generate action proposals. γ is set to 0.1 when scoring proposals.