Scalable Motion Style Transfer with Constrained Diffusion Generation

Authors: Wenjie Yin, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our validation demonstrates the success of training separate models to transfer between as many as ten dance motion styles. Comprehensive experiments find a significant improvement in preserving motion contents in comparison to baseline and ablative diffusionbased style transfer models. In addition, we perform a human study for a subjective assessment of the quality of generated dance motions.
Researcher Affiliation Academia 1KTH Royal Institute of Technology, Sweden 2National Institute of Informatics, Japan 3University of Copenhagen, Denmark
Pseudocode Yes Algorithm 1: Motion style transfer with DDIBs
Open Source Code Yes The code and summary are available at https: //github.com/YIN95/ddst motion.
Open Datasets Yes We evaluate our system on the 100STYLE (Mason, Starke, and Komura 2022) locomotion database and the AIST++ (Tsuchida et al. 2019) dance database.
Dataset Splits No The paper does not provide explicit training/test/validation dataset splits with percentages or sample counts. It mentions using '150-frame clips for experiments' and '90 dance sequences for each style' for evaluation, but not how the data was partitioned for training, validation, and testing.
Hardware Specification No The paper mentions benefiting from access to 'HPC resources provided by the Swedish National Infrastructure for Computing (SNIC)' but does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for experiments.
Software Dependencies No The paper mentions using software components like 'Jukebox model', 'CLIP', and 'RoBERTa', and 'Conformer' but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup No The paper provides details on data preparation (e.g., 'downsample both motion datasets to 30 fps and use 150-frame clips') and data representation but does not specify key experimental setup details such as learning rates, batch sizes, number of epochs, or optimizer settings.