Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos

Authors: Cuong Le, John Viktor Johansson, Manon Kok, Bastian Wandt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on two human motion benchmark datasets. The first and main dataset is the popular Human3.6M [15]. ... We report the quantitative results of OSDCap and other related work on different metrics in Tab. 1. ... We conduct an ablation study to verify the impact of the optimal-state estimation process on simulated motions.
Researcher Affiliation Academia Cuong Le1, Viktor Johansson1, Manon Kok2 and Bastian Wandt1; 1Department of Electrical Engineering, Linköping University, Sweden; 2Delft Center for Systems and Control, Delft University of Technology, The Netherlands
Pseudocode No The paper describes the approach using text, mathematical equations, and flow diagrams (Figure 2 and Figure 4), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes The code is available on . (Abstract); The paper will provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results in the final version. (Question 5, NeurIPS checklist)
Open Datasets Yes We evaluate our approach on two human motion benchmark datasets. The first and main dataset is the popular Human3.6M [15]. ... The second database is Fit3D [7]... Since the scene setting from Human3.6M and Fit3D are very similar, we perform an additional evaluation on the new dataset Sports Pose [14].
Dataset Splits Yes Following previous work [38, 21], the first five subjects (S1, S5, S6, S7, S8) are used for training, and the last two (S9, S11) for evaluation. ... We split the data by taking samples from the 6 actors (s03, s04, s05, s07, s08, s10) for training, and 2 actors (s09, s11) for evaluation... For Sports Pose, we only consider sequences that contain human at time step 0: (S02, S03, S05, S06, S07, S08, S09) for fine-tuning and (S12, S13, S14) for evaluation.
Hardware Specification Yes The proposed pipeline of OSDCap was trained and evaluated on the NVIDIA-A100 GPU with 40Gb of memory.
Software Dependencies No The paper mentions software like RBDL [6], Py Bullet [3], TRACE [40], and common functions like Leaky ReLU and Layernorm. However, it does not specify explicit version numbers for any of these software dependencies.
Experiment Setup Yes The initial motion observation is generated by TRACE [40]. As suggested by [38, 8], all extracted motions are down-sampled from 100Hz to 50Hz. The samples are aligned to the world origin in the first frame, then split into 100-frame sub-sequences to utilize batch training and evaluation. ... OSDNet is trained for 15 epochs with a base learning rate of 5e-4 and a batch size of 64. The learning rates from all training processes are scheduled to reduce by a factor of 10 at epochs 10 and 13. Leaky ReLU and Layernorm are used as the activation function and normalization for each linear layer of every module. We also apply a training warm-up strategy on the first 5 epochs by increasing the learning rate by factor of 2 to the base learning rate at epoch 5.