Dancing to Music
Authors: Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental qualitative and quantitative results demonstrate that the proposed method can synthesize realistic, diverse, style-consistent, and beat-matching dances from music. We conduct extensive experiments to evaluate the proposed decomposition-to-composition framework. |
| Researcher Affiliation | Collaboration | 1University of California, Merced 2NVIDIA |
| Pseudocode | No | The paper describes its methods in text and uses equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our data, code and models are publicly available at our website. |
| Open Datasets | Yes | Finally, we provide a large-scale paired music and dance dataset, which is available along with the source code and models at our website. |
| Dataset Splits | No | The paper states 'We randomly select 300 music clips for testing and the rest used for training.' but does not explicitly detail a separate validation split or its size/percentage. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software like PyTorch, Open Pose, and FFMPEG, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Our model is implemented in Py Torch. We use the gated recurrent unit (GRU) to build encoders Emov, Emtd and decoders Guni, Gdan. Each of them is a single-layer GRU with 1024 hidden units. Eini, Estd, and Esty are encoders consisting of 3 fully-connected layers. Ddan and Dmov are discriminators containing 5 fully-connected layers with layer normalization. We set the latent code dimensions to zini R10, zmov R512, and zdan R512. In the decomposition phase, we set the length of a dance unit as 32 frames and the number of beat times within a dance unit as 4. In the composition phase, each input sequence contains 3 to 5 dance units. For training, we use the Adam optimizer [19] with batch size of 512, learning rate of 0.0001, and exponential decay rates (β1, β2) = (0.5, 0.999). In all experiments, we set the hyper-parameters as follows: λu KL = λd KL = 0.01, λshift recon = 1, λd adv = λm adv = 0.1, and λs recon = 1. |