Dancing to Music

Authors: Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental qualitative and quantitative results demonstrate that the proposed method can synthesize realistic, diverse, style-consistent, and beat-matching dances from music. We conduct extensive experiments to evaluate the proposed decomposition-to-composition framework.
Researcher Affiliation Collaboration 1University of California, Merced 2NVIDIA
Pseudocode No The paper describes its methods in text and uses equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our data, code and models are publicly available at our website.
Open Datasets Yes Finally, we provide a large-scale paired music and dance dataset, which is available along with the source code and models at our website.
Dataset Splits No The paper states 'We randomly select 300 music clips for testing and the rest used for training.' but does not explicitly detail a separate validation split or its size/percentage.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions software like PyTorch, Open Pose, and FFMPEG, but does not provide specific version numbers for any of them.
Experiment Setup Yes Our model is implemented in Py Torch. We use the gated recurrent unit (GRU) to build encoders Emov, Emtd and decoders Guni, Gdan. Each of them is a single-layer GRU with 1024 hidden units. Eini, Estd, and Esty are encoders consisting of 3 fully-connected layers. Ddan and Dmov are discriminators containing 5 fully-connected layers with layer normalization. We set the latent code dimensions to zini R10, zmov R512, and zdan R512. In the decomposition phase, we set the length of a dance unit as 32 frames and the number of beat times within a dance unit as 4. In the composition phase, each input sequence contains 3 to 5 dance units. For training, we use the Adam optimizer [19] with batch size of 512, learning rate of 0.0001, and exponential decay rates (β1, β2) = (0.5, 0.999). In all experiments, we set the hyper-parameters as follows: λu KL = λd KL = 0.01, λshift recon = 1, λd adv = λm adv = 0.1, and λs recon = 1.