Self-Supervised Video Action Localization with Adversarial Temporal Transforms
Authors: Guoqiang Gong, Liangfeng Zheng, Wenhao Jiang, Yadong Mu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on THUMOS14 and Activity Net demonstrate that our model consistently outperforms the state-of-the-art weakly-supervised temporal action localization methods. |
| Researcher Affiliation | Collaboration | 1Wangxuan Institute of Computer Technology, Peking University 2Tencent AI Lab {gonggq, zhengliangfeng, myd}@pku.edu.cn, cswhjiang@gmail.com |
| Pseudocode | No | The paper describes algorithmic steps in prose but does not include structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code or links to a code repository. |
| Open Datasets | Yes | To evaluate our method, we conduct experiments on two video benchmarks: THUMOS14 [Idrees et al., 2017] and Activity Net [Heilbron et al., 2015]. |
| Dataset Splits | Yes | Activity Net-1.3... This dataset is divided into training, validation and testing sets with a ratio of 2:1:1. ...we use the training set to train our model and evaluate on the validation set as in previous work [Paul et al., 2018; Narayan et al., 2019]. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The action localization model is trained with batch size 24 and optimized by Adam. The learning rate of localization model is 0.001 on Activity Net and 0.0001 on THUMOS14. The policy network is optimized by Adam with 0.0001 learning rate on Activity Net and 0.00001 learning rate on THUMOS14. α in Eqn. 1 is 0.5. For action localization, classes whose video-level probabilities below 0.1 are filtered out. For the remaining class c, a set of threshold values ranging from [0.1 : 1.0 : 0.1] mean (S[:, c]) is used to generate action proposals. γ is set to 0.1 when scoring proposals. |