Action-guided 3D Human Motion Prediction

Authors: Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of the proposed approach, and we achieve state-of-the-art performance on 3D human motion prediction.
Researcher Affiliation Collaboration Jiangxin Sun Sun Yat-sen University sunjx5@mail2.sysu.edu.cn Zihang Lin Sun Yat-sen University linzh59@mail2.sysu.edu.cn Xintong Han Huya Inc hanxintong@huya.com Jian-Fang Hu Sun Yat-sen University hujf5@mail.sysu.edu.cn Jia Xu Huya Inc xujia@huya.com Wei-Shi Zheng Sun Yat-sen University wszheng@ieee.org
Pseudocode No The paper describes the methodology in prose and with architectural diagrams (Figure 2, Figure 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper states 'For training the 3D skeleton-based action classifier, we use the publicly available code of [20] with default parameters.' This refers to a third-party implementation, not the authors' own code for their proposed method. There is no explicit statement or link indicating that the source code for their method is publicly available.
Open Datasets Yes We evaluate the proposed method on Human3.6M [14] and Penn Action [44] datasets, which contain human action videos captured under different views with various motions.
Dataset Splits Yes Following [42], we use the data of Subjects 1, 6, 7, and 8 as the training set, data of Subject 5 as the validation set, and data of Subjects 9 and 11 as the test set.
Hardware Specification Yes It took about 3 days for Human 3.6M dataset and 20 hours for Penn Action dataset to train our framework using 4 V100 GPUs.
Software Dependencies No The paper mentions using a 3D human reconstruction model from PHD [42] and publicly available code from [20], but it does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in their implementation.
Experiment Setup Yes Network architecture: The dimensions of the encoded 3D human body parameters B(θ), B(β), B(Π) are 1024, 256, 64, respectively. Both the encoder and decoder are designed as 3 stacked FC-Re LU-BN blocks. The number of LSTM hidden units for predicting θ, β, Π are 1024, 256, 64, respectively. The temporal length t of each memory item is 8 and sampling time interval K of long-term motion is set to 3. The channels of both key vector Ck and value vector Cv are kept same and set to 512. Learning details: We train our predictors using SGD algorithm with a Nesterov momentum of 0.9... In the first training stage, we train the encoder and decoder with learning rate 0.01 for 6 epochs. In the second stage, we train the whole prediction block in an end-to-end manner for 30 epochs. The learning rate is initialized to 0.01 and decreased to 0.001 after 12 epochs.