Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS", "Evaluation Metrics. We apply a series of quantitative metrics to evaluate the generated follower s movement in three distinct aspects:", "Table 2: Quantitative benchmark for dance accompaniment.", "we also perform ablation studies using various Duolando variants.
Researcher Affiliation Collaboration 1S-Lab, Nanyang Technological University 2Lexica 3Sense Time 4Shanghai AI Laboratory
Pseudocode Yes Algorithm 1 Off-Policy RL in Duolando
Open Source Code No Code and data will be publicly available upon acceptance.
Open Datasets No To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers performances." and "Code and data will be publicly available upon acceptance.
Dataset Splits No In the experiment, we randomly split the dataset into the 80% training set and 20% test set, with the training set of 168,176 frames (5605.9 seconds) and the test set of 42,496 frames (1416.5 seconds).
Hardware Specification Yes The training is conducted on four NVIDIA Tesla V100 GPUs, taking approximately seven days in total.
Software Dependencies No The paper mentions 'Librosa (Mc Fee et al., 2015)' as an audio processing toolbox but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes In terms of hyper-parameters, we set the codebook capacity K to 512 for all quantized items (zup, zdown, zlhand, zrhand, and ztr). During VQ-VAE training, we segment motion sequences into 4-second (T = 120) slices with a batch size of 64. The encoder s temporal downsampling rate d is set to 4, and the commitment trade-off λ is 0.1. We adopt a learning rate of 3 10 5 to train the VQ-VAE for 500 epochs, with a decay of 0.1 after both the 200th and 300th epochs. Throughout the supervised training state of GPT, we adopt cross-entropy loss with a learning rate of 10 4 for 500 epochs. For the reinforcement learning stage, we apply Loff RL learning rate of 3 10 5 for 50 epochs. The OOD dataset ˆD is compiled on the sequences generated by GPT conditioned on the music and leader motion in test set, without using any ground-truth follower information. During the entire training process, we employ the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.99.