Weakly-supervised Temporal Action Localization by Uncertainty Modeling
Authors: Pilhyeon Lee, Jinglu Wang, Yan Lu, Hyeran Byun1854-1862
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our uncertainty modeling is effective at alleviating the interference of background frames and brings a large performance gain without bells and whistles. We demonstrate that our model significantly outperforms state-of-the-art methods on the benchmarks, THUMOS 14 and Activity Net (1.2 & 1.3). To validate the effectiveness of our method, we perform experiments on two standard benchmarks, THUMOS 14 and Activity Net. |
| Researcher Affiliation | Collaboration | Pilhyeon Lee1* Jinglu Wang2 Yan Lu2 Hyeran Byun1,3 1 Department of Computer Science, Yonsei University 2 Microsoft Research Asia 3 Graduate School of AI, Yonsei University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Pilhyeon/WTAL-Uncertainty-Modeling. |
| Open Datasets | Yes | THUMOS 14 (Jiang et al. 2014) is a widely used dataset for temporal action localization, containing 200 validation videos and 213 test videos of 20 action classes. On the other hand, Activity Net (Caba Heilbron et al. 2015) is a large-scale benchmark with two versions. |
| Dataset Splits | Yes | Following the previous work, we use validation videos for training and test videos for test. On the other hand, Activity Net (Caba Heilbron et al. 2015) is a large-scale benchmark with two versions. Activity Net 1.3, consisting of 200 action categories, includes 10,024 training videos, 4,926 validation videos and 5,044 test videos. Activity Net 1.2 is a subset of the version 1.3, and is composed of 4,819 training videos, 2,383 validation videos and 2,480 test videos of 100 action classes. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only implies general computing environments without specifics. |
| Software Dependencies | No | The paper mentions using 'I3D networks (Carreira and Zisserman 2017)' and 'TVL1 algorithm (Wedel et al. 2009)' but does not provide specific version numbers for any ancillary software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | All hyper-parameters are set by grid search; m = 100, ract = 9, rbkg = 4, α = 5 10 4, β = 1, and θvid = 0.2. Multiple thresholds from 0 to 0.25 with a step size 0.025 are used as θseg, then we perform non-maxium suppression (NMS) with an Io U threshold of 0.6. |