Recognizing Actions in 3D Using Action-Snippets and Activated Simplices

Authors: Chunyu Wang, John Flynn, Yizhou Wang, Alan Yuille

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate how well activated simplices can represent data by computing the distance between a data point and its projection onto the nearest simplex. Then we present action recognition results on three standard benchmark datasets (Li, Zhang, and Liu 2010) (Seidenari et al. 2013) (Xia, Chen, and Aggarwal 2012). We also provide diagnostic analysis.
Researcher Affiliation Academia 1Nat l Eng. Lab. for Video Technology, Cooperative Medianet Innovation Center Key Lab. of Machine Perception (Mo E), Sch l of EECS, Peking University, Beijing, 100871, China {wangchunyu, Yizhou.Wang}@pku.edu.cn 2Department of Statistics, University of California, Los Angeles (UCLA), USA {john.flynn,yuille}@stat.ucla.edu
Pseudocode No The paper describes the algorithm and process in prose and mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We conduct experiments on a large human pose dataset H3.6M (Ionescu et al. 2014). We use 11, 000 3D poses of 11 actions including taking photo , smoking , purchases discussion , etc from the dataset to evaluate our method. The MSR-Action3D Dataset (Li, Zhang, and Liu 2010) provides 557 human pose sequences of ten subjects performing 20 actions which are recorded with a depth sensor. The Florence Dataset (Seidenari et al. 2013) includes nine activities including wave, drink from a bottle, etc. The UTKinect Dataset (Xia, Chen, and Aggarwal 2012) was captured using a single stationary Kinect.
Dataset Splits Yes We split the 11, 000 poses into training and testing subsets each containing 5, 500 poses of the 11 actions. Many works choose five subjects for training and the remaining subjects for testing, e.g. in (Li, Zhang, and Liu 2010), and report the result based on a single split. To make the results more comparable we experiment with all 252 possible splits and report the average accuracy. Following the dataset recommendation, we use a leave-one-actor-out protocol: we train the classifier using all the sequences from nine out of ten actors and test on the remaining one. We use the standard leave-one-sequence-out protocol where one sequence is used for testing and the remaining are used for training.
Hardware Specification No Acknowledgements: We thank Xianjie Chen for helping improve the writing, and for support from the following research grants 973-2015CB351800, the Okawa Foundation Research Grant, NSFC-61272027, NSFC-61231010, NSFC-61527804, NSFC-61421062, NSFC-61210005, ONR grant N00014-15-1-2356 and ARO 62250-CS. We also thank NVIDIA Corporation for donating the GPUs. The paper mentions GPUs but does not specify exact models or other hardware details for the experiments.
Software Dependencies No The paper mentions methods and frameworks like 'sparse coding (Mairal et al. 2009)' and 'k-means initialized by k++ (Arthur and Vassilvitskii 2007)', but does not list specific software libraries, packages, or programming languages with version numbers required for reproduction.
Experiment Setup Yes We set the number of bases for each class to be 40 (by cross-validation). We obtain about 15 activated simplices, whose dimensions are five on average, for each class, activated simplices achieves recognition accuracy of 91.40%. We set the number of bases for each class to be 50 (450 in total) by cross-validation. We learn 40 bases and about 20 activated simplices for each action class. The dimension of the simplices is five on average. We also evaluated the influence of the two main parameters in the model, i.e. the number of bases and the number of poses in an action-snippet.