reproducibilityindex.ai

SongCreator: Lyrics-based Universal Song Generation

Authors: Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of Song Creator by achieving state-of-the-art or competitive performances on all eight tasks.
Researcher Affiliation	Collaboration	1 Shenzhen International Graduate School, Tsinghua University, Shenzhen 2 Independent Researcher 3 The Chinese University of Hong Kong, Hong Kong SAR {leis21, yx-zhou23}@mails.tsinghua.edu.cn, zywu@sz.tsinghua.edu.cn
Pseudocode	No	The paper describes the system architecture and process in figures and text but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	We are committed to advancing the field responsibly, and therefore, the checkpoints trained on the full dataset will not be released.
Open Datasets	Yes	We collected approximately 8500 hours of songs with lyrics from the internet for model training, comprising part of the DISCO-10M [69] dataset and some in-house datasets.
Dataset Splits	No	The paper states that DSLM is trained on 8,500 hours of song data split into 1.7M clips, and some experiments use a 'held-out set', but it does not provide specific percentages or counts for training, validation, and test splits for the main experiments.
Hardware Specification	Yes	During training, we train the DSLM for 500K steps using 8 NVIDIA A800 GPUs, with a batch size of 8 for each GPU.
Software Dependencies	No	The paper mentions various open-source libraries and models like BEST-RQ, Demucs, and GPT, along with their GitHub links, but does not specify the exact version numbers for these software dependencies (e.g., 'PyTorch 1.x' or 'Demucs vX.Y').
Experiment Setup	Yes	During training, we train the DSLM for 500K steps using 8 NVIDIA A800 GPUs, with a batch size of 8 for each GPU. Adam optimizer is used with β1 = 0.9, β2 = 0.98, ϵ = 10 9 and follow the same learning rate schedule in [66]. Consistently, top-k sampling is adopted for inference, in which k and temperature are set to 50 and 0.9, respectively.