SongCreator: Lyrics-based Universal Song Generation
Authors: Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Song Creator by achieving state-of-the-art or competitive performances on all eight tasks. |
| Researcher Affiliation | Collaboration | 1 Shenzhen International Graduate School, Tsinghua University, Shenzhen 2 Independent Researcher 3 The Chinese University of Hong Kong, Hong Kong SAR {leis21, yx-zhou23}@mails.tsinghua.edu.cn, zywu@sz.tsinghua.edu.cn |
| Pseudocode | No | The paper describes the system architecture and process in figures and text but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | We are committed to advancing the field responsibly, and therefore, the checkpoints trained on the full dataset will not be released. |
| Open Datasets | Yes | We collected approximately 8500 hours of songs with lyrics from the internet for model training, comprising part of the DISCO-10M [69] dataset and some in-house datasets. |
| Dataset Splits | No | The paper states that DSLM is trained on 8,500 hours of song data split into 1.7M clips, and some experiments use a 'held-out set', but it does not provide specific percentages or counts for training, validation, and test splits for the main experiments. |
| Hardware Specification | Yes | During training, we train the DSLM for 500K steps using 8 NVIDIA A800 GPUs, with a batch size of 8 for each GPU. |
| Software Dependencies | No | The paper mentions various open-source libraries and models like BEST-RQ, Demucs, and GPT, along with their GitHub links, but does not specify the exact version numbers for these software dependencies (e.g., 'PyTorch 1.x' or 'Demucs vX.Y'). |
| Experiment Setup | Yes | During training, we train the DSLM for 500K steps using 8 NVIDIA A800 GPUs, with a batch size of 8 for each GPU. Adam optimizer is used with β1 = 0.9, β2 = 0.98, ϵ = 10 9 and follow the same learning rate schedule in [66]. Consistently, top-k sampling is adopted for inference, in which k and temperature are set to 50 and 0.9, respectively. |