Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
Authors: Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method was extensively evaluated on two large RGB-D action recognition datasets, Cha Learn LAP Iso GD and NTU RGB+D datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art results. |
| Researcher Affiliation | Collaboration | Pichao Wang,1,2 Wanqing Li,1 Jun Wan,3 Philip Ogunbona,1 Xinwang Liu,4 1Advanced Multimedia Research Lab, University of Wollongong, Australia 2Motovis Inc 3Center for Biometrics and Security Research & National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 4School of Computer Science, National University of Defense Technology, Changsha 410073, China |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes the method in prose and provides equations. |
| Open Source Code | No | The paper states 'The proposed method was implemented using the Caffe framework (Jia et al. 2014b).' which refers to a third-party framework, but it does not provide a link or explicit statement about the availability of their own implementation's source code. |
| Open Datasets | Yes | The proposed method was evaluated on three benchmark RGB-D datasets, namely, two large ones, Cha Learn LAP Iso GD (Wan et al. 2016) and NTU RGB+D (Shahroudy et al. 2016) datasets, and a small one, SYSU 3D HOI (Hu et al. 2015) dataset. |
| Dataset Splits | Yes | The dataset [Cha Learn LAP Iso GD] is divided into training, validation and test sets. [...] It [NTU RGB+D] has both cross-subject and cross-view evaluation. In the cross-subject evaluation, samples of subjects 1, 2, 4, 5, 8, 9, 13, 14, 15, 16, 17, 18, 19, 25, 27, 28, 31, 34, 35 and 38 were used as training and samples of the remaining subjects were reserved for testing. In the cross-view evaluation, samples taken by cameras 2 and 3 were used as training, while the testing set includes samples from camera 1. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. |
| Software Dependencies | No | The paper states 'The proposed method was implemented using the Caffe framework (Jia et al. 2014b).' It mentions a software framework but does not specify a version number for Caffe or any other critical software dependencies. |
| Experiment Setup | Yes | The initial learning rate was set to 0.001 and decreased by a factor of 10 every 12 epochs. The batch size was set as 50 images, with 5 actions in each batch. The network weights are learned using the mini-batch stochastic gradient descent with the momentum set to the value 0.9 and weight decay set to the value 0.0005. The parameter γ was assigned the value 10 in order to ensure that the two losses are of comparable magnitude. Parameters α and λ were assigned values that depend on the level of difficulty of the datasets. For this dataset [Cha Learn LAP Iso GD], the margin α was set to 0.2. The parameter, λ, was set to a value of 5 to solve the more difficult task of learning large cross-modality discrepancy. For this dataset [NTU RGB+D], the margin α was set to 0.1 while λ was set to 2. For this dataset [SYSU 3D HOI], the margin α was set to 0 while λ was set to 1. |