FLAME: Free-Form Language-Based Motion Synthesis & Editing

Authors: Jihoon Kim, Jiseob Kim, Sungjoon Choi

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that FLAME achieves state-of-the-art generation performances on three text-motion datasets: Human ML3D, BABEL, and KIT. and Quantitative Results on Text-to-Motion We compare our method to four state-of-the-art models: Lin et al. (2018), Language2Pose (Ahuja and Morency 2019), Ghosh et al. (2021), TEMOS (Petrovich, Black, and Varol 2022), and Guo et al. (2022). In case of comparing models using PLM, we replace the PLM with the same model, Ro BERTa, to prevent the selection of PLM from influencing the benchmark. Table 1 and Table 2 present the benchmark results on the three datasets.
Researcher Affiliation Collaboration Jihoon Kim1,2, Jiseob Kim2, Sungjoon Choi1 1 Korea University 2 Kakao Brain
Pseudocode No The paper describes the model architecture and processes, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets Yes 1. Human ML3DSMPL (Guo et al. 2022) is a recently proposed large motion-text pair dataset containing 44,970 full-sentence text descriptions for 14,616 motions from AMASS (Mahmood et al. 2019) and Human Act12 (Guo et al. 2020). [...] 2. BABEL (Punnakkal et al. 2021) provides a language description for AMASS. [...] 3. KIT (Plappert, Mandery, and Asfour 2016) consists of 3,911 motion sequences paired with 6,353 textual descriptions.
Dataset Splits No The paper refers to using a 'test set' for evaluation and mentions following the evaluation protocol of TEMOS for the KIT dataset, but it does not provide explicit details on the training, validation, and test splits (e.g., percentages or sample counts) for any of the datasets.
Hardware Specification Yes We train the FLAME model using 4 NVIDIA Tesla V100 SXM2 32GB for 600K steps on the Human ML3D, 1M steps on the BABEL, and 200K steps on the KIT dataset. and Performance is recorded on a single NVIDIA s Tesla V100 SXM2 32GB machine.
Software Dependencies No The paper mentions using 'Adam W' optimizer and the 'Ro BERTa' pre-trained language model with citations, but it does not provide specific version numbers for general software dependencies like programming languages or deep learning frameworks (e.g., Python, PyTorch version).
Experiment Setup Yes Our FLAME model uses 1,000 diffusion time steps to learn the reverse process with cosine beta scheduling (Nichol and Dhariwal 2021). Adam W (Loshchilov and Hutter 2017) is used for experiments with learning rate of 0.0001 and weight decay of 0.0001. For classifier-free guidance, 25% of texts are replaced with empty strings during training, and the classifier guidance scale of 8.0 is used for sampling.