Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

Authors: Xueyuan Yang, Chao Yao, Xiaojuan Ban

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.
Researcher Affiliation Academia Xueyuan Yang 1,2, Chao Yao1,2*, Xiaojuan Ban1,2,3,4* 1Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing 100083, China. 2University of Science and Technology Beijing, Beijing 100083, China. 3Key Laboratory of Intelligent Bionic Unmanned Systems, Ministry of Education, Beijing 100083, China. 4Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China. m202210673@xs.ustb.edu.cn,{yaochao, banxj}@ustb.edu.cn
Pseudocode No The paper describes its modules and processes through text and diagrams (e.g., Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any explicit statement about releasing its source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We utilized the Babel dataset (Punnakkal et al. 2021) for semantic annotations... For the DIP-IMU dataset (Huang et al. 2018)... The AMASS dataset... Totalcapture dataset (Trumble et al. 2017)...
Dataset Splits No The paper mentions using specific subjects for 'evaluation' (Subjects 9 and 10 from DIP-IMU) and the rest for 'training', but it does not specify a distinct 'validation' dataset split or its size for hyperparameter tuning or early stopping, which is common in machine learning contexts.
Hardware Specification Yes The entire training and evaluation regimen was conducted on a system equipped with 1 Intel(R) Xeon(R) Silver 4110 CPU and 1 NVIDIA Ge Force RTX 2080 Ti GPU.
Software Dependencies Yes Our model was developed using Py Torch 1.13.0, further accelerated by CUDA 11.6.
Experiment Setup Yes Our model configuration sets the input sequence length T at 80 frames, with a window and shifted size of 20 and 10 frames, respectively, and a threshold of M being 15. The training process, utilizing a batch size of 40, incorporates the Adam optimizer (Kingma and Ba 2017) initialized with a learning rate of 2e-5. To balance the magnitude of the loss, we set λ and α to 1, β to 10, δ to 0.1, and γ to 0.01.