Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "Evaluation Metrics. We apply a series of quantitative metrics to evaluate the generated follower s movement in three distinct aspects:", "Table 2: Quantitative benchmark for dance accompaniment.", "we also perform ablation studies using various Duolando variants. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University 2Lexica 3Sense Time 4Shanghai AI Laboratory |
| Pseudocode | Yes | Algorithm 1 Off-Policy RL in Duolando |
| Open Source Code | No | Code and data will be publicly available upon acceptance. |
| Open Datasets | No | To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers performances." and "Code and data will be publicly available upon acceptance. |
| Dataset Splits | No | In the experiment, we randomly split the dataset into the 80% training set and 20% test set, with the training set of 168,176 frames (5605.9 seconds) and the test set of 42,496 frames (1416.5 seconds). |
| Hardware Specification | Yes | The training is conducted on four NVIDIA Tesla V100 GPUs, taking approximately seven days in total. |
| Software Dependencies | No | The paper mentions 'Librosa (Mc Fee et al., 2015)' as an audio processing toolbox but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | In terms of hyper-parameters, we set the codebook capacity K to 512 for all quantized items (zup, zdown, zlhand, zrhand, and ztr). During VQ-VAE training, we segment motion sequences into 4-second (T = 120) slices with a batch size of 64. The encoder s temporal downsampling rate d is set to 4, and the commitment trade-off λ is 0.1. We adopt a learning rate of 3 10 5 to train the VQ-VAE for 500 epochs, with a decay of 0.1 after both the 200th and 300th epochs. Throughout the supervised training state of GPT, we adopt cross-entropy loss with a learning rate of 10 4 for 500 epochs. For the reinforcement learning stage, we apply Loff RL learning rate of 3 10 5 for 50 epochs. The OOD dataset ˆD is compiled on the sequences generated by GPT conditioned on the music and leader motion in test set, without using any ground-truth follower information. During the entire training process, we employ the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.99. |