CigTime: Corrective Instruction Generation Through Inverse Motion Editing

Authors: Qihang Fang, Chengcheng Tang, Bugra Tekin, Yanchao Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present both qualitative and quantitative results across a diverse range of applications that largely improve upon baselines. Our approach demonstrates its effectiveness in instructional scenarios, offering text-based guidance to correct and enhance user performance.
Researcher Affiliation Collaboration Qihang Fang1, Chengcheng Tang2, Bugra Tekin2, and Yanchao Yang1* 1The University of Hong Kong 2Meta Reality Labs {qihfang}@gmail.com, {chengcheng.tang,bugratekin}@meta.com, {yanchaoy}@hku.hk
Pseudocode No The paper describes the methodology using text and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No We will release the codes and our generated dataset after acceptance.
Open Datasets Yes Datasets We obtain the source motion sequences from Human ML3D [13], a dataset containing 3D human motions and associated language descriptions. ... To evaluate the generalization ability of our algorithm, we collected 1525 samples from the Fit3D [11] dataset. ... We further evaluate our method baselines on KIT dataset.
Dataset Splits Yes We split Human ML3D following the original setting and for each motion sequence in Human ML3D, we randomly select one instruction from the corresponding split for editing the sequence.
Hardware Specification Yes We use a batch size of 512 and train on four NVIDIA Tesla A100 GPUs for eight epochs, which takes approximately 5 hours to complete.
Software Dependencies No The paper mentions models like 'Llama-3-8B' and 'Adam optimizer' but does not provide specific version numbers for programming languages, libraries (e.g., PyTorch), or other ancillary software dependencies.
Experiment Setup Yes We fine-tune a pre-trained Llama-3-8B [30] using full-parameter fine-tuning for corrective instruction generation. The model is optimized using the Adam optimizer with an initial learning rate of 10 5. We use a batch size of 512 and train on four NVIDIA Tesla A100 GPUs for eight epochs, which takes approximately 5 hours to complete.