Soft Video Parsing by Label Distribution Learning

Authors: Xin Geng, Miaogen Ling

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments on two benchmark datasets: THUMOS 2014 Detection Challenge dataset (Jiang et al. 2014) and MSR-II Action dataset (Yuan, Liu, and Wu 2011). The proposed method shows promising results on the THUMOS 14 and MSR-II datasets and its computational complexity is much less than the state-of-the-art method.
Researcher Affiliation Academia Xin Geng, Miaogen Ling MOE Key Laboratory of Computer Network and Information Integration, School of Computer Science and Engineering, Southeast University, Nanjing 210096, China {xgeng,mgling}@seu.edu.cn
Pseudocode Yes Algorithm 1 Soft Video Parsing Training:
Open Source Code No The paper does not provide any links to or explicit statements about the availability of its source code.
Open Datasets Yes We conduct our experiments on two benchmark datasets: THUMOS 2014 Detection Challenge dataset (Jiang et al. 2014) and MSR-II Action dataset (Yuan, Liu, and Wu 2011).
Dataset Splits Yes We conduct 5-fold cross validation on the two datasets. For SSRG and our method (denoted by SP for Soft Parsing) we use the training set of the first fold to select the optimal number of sub-action via 3-fold cross validation.
Hardware Specification No The paper does not provide any specific details about the hardware used for running its experiments.
Software Dependencies No The paper mentions using techniques like 'L-BFGS', 'SVM model', and 'RBF-χ2 kernel SVM', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes As for the parameters in SSRG and SP, the minimum semantic unit duration ρl is set as 10 frames to eliminate the small fluctuation. For different numbers of sub-actions in each action, 1, 2 and 3, the maximum semantic unit duration ρu is set as 300, 200 and 150, respectively and for the background, ρu is set as 300. As for RNMS, the parameter C of SVM is set by 3-fold cross validation on the training set of the first fold. It is chosen from the range C {3 2, 3 1, , 37} (Wang et al. 2015). Moreover, the length of candidate sliding window is chosen from 10, 20, , 300 frames and the sliding step is set as 10 frames.