Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "Evaluation Metrics. We apply a series of quantitative metrics to evaluate the generated follower s movement in three distinct aspects:", "Table 2: Quantitative benchmark for dance accompaniment.", "we also perform ablation studies using various Duolando variants. |
| Researcher Affiliation | Collaboration | 1S-Lab, Nanyang Technological University 2Lexica 3Sense Time 4Shanghai AI Laboratory |
| Pseudocode | Yes | Algorithm 1 Off-Policy RL in Duolando |
| Open Source Code | No | Code and data will be publicly available upon acceptance. |
| Open Datasets | No | To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers performances." and "Code and data will be publicly available upon acceptance. |
| Dataset Splits | No | In the experiment, we randomly split the dataset into the 80% training set and 20% test set, with the training set of 168,176 frames (5605.9 seconds) and the test set of 42,496 frames (1416.5 seconds). |
| Hardware Specification | Yes | The training is conducted on four NVIDIA Tesla V100 GPUs, taking approximately seven days in total. |
| Software Dependencies | No | The paper mentions 'Librosa (Mc Fee et al., 2015)' as an audio processing toolbox but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | In terms of hyper-parameters, we set the codebook capacity K to 512 for all quantized items (zup, zdown, zlhand, zrhand, and ztr). During VQ-VAE training, we segment motion sequences into 4-second (T = 120) slices with a batch size of 64. The encoder s temporal downsampling rate d is set to 4, and the commitment trade-off λ is 0.1. We adopt a learning rate of 3 10 5 to train the VQ-VAE for 500 epochs, with a decay of 0.1 after both the 200th and 300th epochs. Throughout the supervised training state of GPT, we adopt cross-entropy loss with a learning rate of 10 4 for 500 epochs. For the reinforcement learning stage, we apply Loff RL learning rate of 3 10 5 for 50 epochs. The OOD dataset ˆD is compiled on the sequences generated by GPT conditioned on the music and leader motion in test set, without using any ground-truth follower information. During the entire training process, we employ the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.99. |