Video Imprint Segmentation for Temporal Action Detection in Untrimmed Videos

Authors: Zhanning Gao, Le Wang, Qilin Zhang, Zhenxing Niu, Nanning Zheng, Gang Hua8328-8335

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of the framework is validated in two public action detection datasets. We conduct extensive experiments to evaluate our DBS framework on two challenging datasets, i.e., the THUMOS 14 (Jiang et al. 2014) dataset and the Activity Net dataset (Heilbron et al. 2015). Experimental results show that the DBS method achieves state-of-the-art performance.
Researcher Affiliation Collaboration Zhanning Gao,1,2 Le Wang,1 Qilin Zhang,3 Zhenxing Niu,2 Nanning Zheng,1 Gang Hua4 1Xi an Jiaotong University, 2Alibaba Group, 3HERE Technologies, 4Microsoft Cloud and AI
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate our method on both the per-frame labeling task and the temporal action detection task with the THUMOS 14 (Jiang et al. 2014) and Activity Net (Heilbron et al. 2015) datasets. We also employ parts of UCF101 (Soomro, Zamir, and Shah 2012) to augment the training data.
Dataset Splits Yes THUMOS 14 has 1010 videos for validation and 1574 videos for testing. This dataset does not provide the training set. Following the standard practice, our method is trained on the validation set and evaluated on the testing set. Activity Net v1.2 contains 9682 videos in 100 classes, and Activity Net v1.3 contains 19994 videos in 200 classes. Those videos are divided in three subsets, i.e., training, validation and testing, with 2 : 1 : 1.
Hardware Specification Yes We report the average GPU (Titan Xp with 12GB memory) running time with the THUMOS 14 test set (average video duration time is 230s).
Software Dependencies No The paper mentions software components like "FCN" and the "Adam algorithm" but does not specify version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes The FCN consists of several convolutional layers, each layer uses 3 ˆ 3 kernel size and the same channels with the input video imprint. The activation units are Re LU for the mid convolutional layers and softmax for the last convilutional layer. The categorical crossentropy loss function is adopted for the training process, and the FCN is trained with the adaptive moment estimation (Adam) algorithm (Kingma and Ba 2014). In our experiments, the number of epoch is set to 10, and the batchsize is set to 16.