Self-Supervised Video Action Localization with Adversarial Temporal Transforms

Authors: Guoqiang Gong, Liangfeng Zheng, Wenhao Jiang, Yadong Mu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on THUMOS14 and Activity Net demonstrate that our model consistently outperforms the state-of-the-art weakly-supervised temporal action localization methods.
Researcher Affiliation Collaboration 1Wangxuan Institute of Computer Technology, Peking University 2Tencent AI Lab {gonggq, zhengliangfeng, myd}@pku.edu.cn, cswhjiang@gmail.com
Pseudocode No The paper describes algorithmic steps in prose but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about the release of source code or links to a code repository.
Open Datasets Yes To evaluate our method, we conduct experiments on two video benchmarks: THUMOS14 [Idrees et al., 2017] and Activity Net [Heilbron et al., 2015].
Dataset Splits Yes Activity Net-1.3... This dataset is divided into training, validation and testing sets with a ratio of 2:1:1. ...we use the training set to train our model and evaluate on the validation set as in previous work [Paul et al., 2018; Narayan et al., 2019].
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes The action localization model is trained with batch size 24 and optimized by Adam. The learning rate of localization model is 0.001 on Activity Net and 0.0001 on THUMOS14. The policy network is optimized by Adam with 0.0001 learning rate on Activity Net and 0.00001 learning rate on THUMOS14. α in Eqn. 1 is 0.5. For action localization, classes whose video-level probabilities below 0.1 are filtered out. For the remaining class c, a set of threshold values ranging from [0.1 : 1.0 : 0.1] mean (S[:, c]) is used to generate action proposals. γ is set to 0.1 when scoring proposals.