Action-guided 3D Human Motion Prediction
Authors: Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of the proposed approach, and we achieve state-of-the-art performance on 3D human motion prediction. |
| Researcher Affiliation | Collaboration | Jiangxin Sun Sun Yat-sen University sunjx5@mail2.sysu.edu.cn Zihang Lin Sun Yat-sen University linzh59@mail2.sysu.edu.cn Xintong Han Huya Inc hanxintong@huya.com Jian-Fang Hu Sun Yat-sen University hujf5@mail.sysu.edu.cn Jia Xu Huya Inc xujia@huya.com Wei-Shi Zheng Sun Yat-sen University wszheng@ieee.org |
| Pseudocode | No | The paper describes the methodology in prose and with architectural diagrams (Figure 2, Figure 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper states 'For training the 3D skeleton-based action classifier, we use the publicly available code of [20] with default parameters.' This refers to a third-party implementation, not the authors' own code for their proposed method. There is no explicit statement or link indicating that the source code for their method is publicly available. |
| Open Datasets | Yes | We evaluate the proposed method on Human3.6M [14] and Penn Action [44] datasets, which contain human action videos captured under different views with various motions. |
| Dataset Splits | Yes | Following [42], we use the data of Subjects 1, 6, 7, and 8 as the training set, data of Subject 5 as the validation set, and data of Subjects 9 and 11 as the test set. |
| Hardware Specification | Yes | It took about 3 days for Human 3.6M dataset and 20 hours for Penn Action dataset to train our framework using 4 V100 GPUs. |
| Software Dependencies | No | The paper mentions using a 3D human reconstruction model from PHD [42] and publicly available code from [20], but it does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in their implementation. |
| Experiment Setup | Yes | Network architecture: The dimensions of the encoded 3D human body parameters B(θ), B(β), B(Π) are 1024, 256, 64, respectively. Both the encoder and decoder are designed as 3 stacked FC-Re LU-BN blocks. The number of LSTM hidden units for predicting θ, β, Π are 1024, 256, 64, respectively. The temporal length t of each memory item is 8 and sampling time interval K of long-term motion is set to 3. The channels of both key vector Ck and value vector Cv are kept same and set to 512. Learning details: We train our predictors using SGD algorithm with a Nesterov momentum of 0.9... In the first training stage, we train the encoder and decoder with learning rate 0.01 for 6 epochs. In the second stage, we train the whole prediction block in an end-to-end manner for 30 epochs. The learning rate is initialized to 0.01 and decreased to 0.001 after 12 epochs. |