You Never Stop Dancing: Non-freezing Dance Generation via Bank-constrained Manifold Projection
Authors: Jiangxin Sun, Chunyu Wang, Huang Hu, Hanjiang Lai, Zhi Jin, Jian-Fang Hu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on AIST++, a public large-scale 3D dance motion benchmark, demonstrate that our method notably outperforms the baselines in terms of quality, diversity and time length. We conduct extensive experiments to evaluate our approach on the AIST++ dataset [20]. |
| Researcher Affiliation | Collaboration | Jiangxin Sun Sun Yat-sen University sunjx5@mail2.sysu.edu.cn Chunyu Wang Microsoft Research Asia chnuwa@microsoft.com Huang Hu Peking University tonyhu@pku.edu.cn Hanjiang Lai Sun Yat-sen University laihanj3@mail.sysu.edu.cn Zhi Jin Sun Yat-sen University jinzh26@mail.sysu.edu.cn Jian-Fang Hu Sun Yat-sen University hujf5@mail.sysu.edu.cn |
| Pseudocode | No | The paper describes its methods using mathematical formulations and architectural diagrams, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | Code and models will be released upon acceptance to ensure the reproducibility. |
| Open Datasets | Yes | We evaluate our method on the largest AIST++ [20] dance dataset that contains 60 music pieces belonging to 10 dance genres. In total, there are 992 3D pose sequences at 60 FPS. Following [20], we use 952 samples for training and the rest 40 for evaluation. |
| Dataset Splits | No | The paper mentions using 952 samples for training and 40 for evaluation, but it does not explicitly state a separate 'validation' split with specific percentages or counts. |
| Hardware Specification | Yes | The whole training process takes about four days on four NVIDIA Ge Force RTX 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions using 'Librosa [23]' for music features and 'Adam optimizer [15]', but it does not provide specific version numbers for these or other software libraries/frameworks. |
| Experiment Setup | Yes | The model takes 240 frames of music and 120 frames of motions as input and predicts the next K = 20 frames of motions. For the encoders and decoders in Refine Bank and Transit Bank, we use transformers with 4 layers and 10 attention heads with 2048 hidden size. The number of items in manifold bank and past-future bank is 256 and each item is a 2048-dim latent vector. In the first training stage, we adopt Adam optimizer [15] with a learning rate of 1 10 4 to train the manifold bank for 50 epochs. In the second training stage, we pre-train the Refine Bank using Adam optimizer with a learning rate of 1 10 4 for 25 epochs. In the third stage, we train the whole framework with Adam optimizer for 150 epochs. The learning rate starts with 1 10 4 and decreases to 1 10 5, 1 10 6 after {30, 90} epochs, respectively. |