Generative Model-Based Feature Knowledge Distillation for Action Recognition
Authors: Guiqin Wang, Peng Zhao, Yanjiang Shi, Cong Zhao, Shusen Yang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets, proving considerable enhancements in video action recognition task. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Technology, Xi an Jiaotong University 2 School of Mathematics and Statistics, Xi an Jiaotong University 3 National Engineering Laboratory for Big Data Analytics, Xi an Jiaotong University |
| Pseudocode | No | The paper describes the methodology using text and diagrams, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Our code is available at https://github.com/aaai24/Generative-based-KD. |
| Open Datasets | Yes | To validate the effectiveness of our model, we conduct extensive experiments on commonly-used action recognition benchmark UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011), commonly-used action detection benchmark THUMOS14 (Jiang et al. 2014). |
| Dataset Splits | Yes | UCF101 It consists of 13320 action videos, including 101 action categories, which has 3 official splits and each split divides the training set and test set at a ratio of 7:3. HMDB51 It consists of 6849 video clips, which contains 51 action categories and each category includes at least 101 video clips. It has the same split ratio with UCF101 dataset. THUMOS14 It contains 101 categories of videos and is composed of four parts: training, validation, testing and background set. Each set includes 13320, 1010, 1574 and 2500 videos, respectively. Following the common setting (Jiang et al. 2014), we used 200 videos in the validation set for training, 213 videos in the testing set for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments. It mentions 'edge devices' in general terms but no concrete hardware for their experiments. |
| Software Dependencies | No | The paper mentions using 'tvl1' for optical frames and 'SGD optimizer' and 'Adam optimizer' but does not specify version numbers for any software, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | On UCF101 and HMDB51, ... The length of the clip is set to 64. We resize the frame to 256 for UCF101 and 240 for HMDB51. ... On THUMOS14, we sample RGB and optical flow at 10 fps and split video into clips of 256 frames. Adjacent clips have a temporal overlap of 30 frames during training and 128 frames during testing. The size of frame is set to 96 96. ... β is an empirical hyperparameter, β = 0.01. ... γ is an empirical hyperparameter, γ = 0.1. ... α = 0.1 is an empirical hyperparameter. |