Learning Disentangled Classification and Localization Representations for Temporal Action Localization
Authors: Zixin Zhu, Le Wang, Wei Tang, Ziyi Liu, Nanning Zheng, Gang Hua3644-3652
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on two popular benchmarks for TAL, which outperforms all state-of-the-art methods. ... Experiments Datasets. The THUMOS14 (Jiang et al. 2014) dataset provides temporal annotations for 20 action categories. ... Activity Net v1.3 (Heilbron et al. 2015) is currently the largest dataset of action analysis in videos... Ablation Study. In order to explore the effectiveness of our disentanglement network and how disentangled features are better than original features, we conducted in-depth ablation experiments. |
| Researcher Affiliation | Collaboration | Zixin Zhu1, Le Wang1*, Wei Tang2, Ziyi Liu3, Nanning Zheng1, Gang Hua3 1Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2University of Illinois at Chicago 3Wormpex AI Research |
| Pseudocode | No | The paper describes its framework and components with text and diagrams but does not include formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code or provide a link to a code repository for their method. |
| Open Datasets | Yes | The THUMOS14 (Jiang et al. 2014) dataset provides temporal annotations for 20 action categories. ... Activity Net v1.3 (Heilbron et al. 2015) is currently the largest dataset of action analysis in videos... |
| Dataset Splits | Yes | Following the common setting in THUMOS14, we apply 200 videos (including 3,007 action instances) in the validation set for training and conduct evaluation on the 213 annotated videos (including 3,358 action instances) from the test set. ... The training set contains about 10,000 untrimmed videos. Both the validation set and the test set contain about 5,000 untrimmed videos. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments, such as CPU or GPU models. |
| Software Dependencies | No | The paper mentions using specific networks/models like I3D, BSN, and Untrimmed Net but does not provide specific software dependencies (e.g., programming languages, libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | The interval between snippets is set to 16 frames. ... The ratio of fusing the RGB and optical flow predictions is 5:6. ... In all experiments, we set λ1 = λ2 = 0.5. |