Harmonizing Stochasticity and Determinism: Scene-responsive Diverse Human Motion Prediction

Authors: Zhenyu Lou, Qiongjie Cui, Tuo Wang, Zhenbo Song, Luoming Zhang, Cheng Cheng, Haofan Wang, Xu Tang, Huaxia Li, Hong Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On two real-captured benchmarks, Di Mo P3D has demonstrated significant improvements over state-of-the-art methods, showcasing its effectiveness in generating diverse and physically consistent motion predictions within real-world 3D environments.
Researcher Affiliation Collaboration Zhenyu Lou1 Qiongjie Cui2 Tuo Wang4 Zhenbo Song2 Luoming Zhang1 Cheng Cheng5 Haofan Wang3 Xu Tang3 Huaxia Li3 Hong Zhou1 1Zhejiang University, 2Nanjing University of Science and Technology, 3Xiaohongshu Inc, 4University of Texas at Austin, 5Concordia University
Pseudocode Yes Algorithm 1 get_heightmap(S):
Open Source Code Yes More details and the video demo are available at the webpage https://sites.google.com/view/dimop3d. Justification: Appendix A tells the training details and the supplemental material provides our source code.
Open Datasets Yes Dataset-1: GIMO [88], which records motion sequences represented by full-body SMPL-X poses with 129K frames. It consists of 14 scenes with 3D point clouds, each scene is captured by a 3D Li DAR sensor, containing 10-20 objects with 500K vertices. Dataset-2: CIRCLE [4] comprises 10 hours of high-fidelity full-body motion sequences from 5 subjects across nine apartment scenes.
Dataset Splits No For a fair comparison, we follow the official split to divide the dataset into training and testing sets according to the scenes.
Hardware Specification Yes All training is conducted on a single NVIDIA RTX3090 GPU, with the complete pipeline converging in 8 hours.
Software Dependencies No The paper mentions software components like 'Scan Net pretrained Soft Group model' but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA in the experimental setup or training details section (Appendix A).
Experiment Setup Yes We set L = 3-sec and L = 5-sec to achieve a long-term prediction [40, 49]. Hyperparameters λcont = 3, λdist = 10, λobj = 1 are adjusted to maintain a balance among the factors. We set σ1 = 0.3 and σ2 = σ3 = 1.0 for balance.