reproducibilityindex.ai

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

Authors: Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin13798-13805

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results with objective and subjective evaluations demonstrate that Song MASS signiﬁcantly improves the quality of lyric and melody generation with the help of pre-training and alignment constraint.
Researcher Affiliation	Collaboration	1National Engineering Research Center for Software Engineering, Peking University 2Nanjing University of Science and Technology 3Microsoft Research Asia 4Zhejiang University
Pseudocode	Yes	Algorithm 1 DP for Melody-Lyric Alignment
Open Source Code	No	Melody and lyric samples are available at: https: //musicgeneration.github.io/Song MASS/ - This link provides samples, not the source code for the methodology.
Open Datasets	Yes	We use 380,000+ lyrics from Metro Lyrics 1 as our unpaired lyrics for pretraining, which contains 362,237 songs. The lyrics in each song are split into sentences by the line break. For unpaired melodies, we choose The Lakh MIDI Dataset (Raffel 2016)2. We extract the melody tracks by Midi-miner3, and get 65,954 melodies as our unpaired data for pre-training ﬁnally. ... We use the LMD dataset (Yu and Canales 2019)4 which contains aligned melodies and lyrics from 7,998 songs.
Dataset Splits	Yes	The dataset is split as training/valid/test set with a ratio of 8:1:1.
Hardware Specification	Yes	The model is trained on a NVIDIA Tesla T4 GPU card
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were explicitly mentioned.
Experiment Setup	Yes	We choose Transformer (Vaswani et al. 2017) as our basic model structure, which consists of 6 encoder/decoder layers. The hidden size and ﬁlter size of each layer are set as 512 and 2048. The number of attention heads is 8. We use the same masking strategy as in Song et al. (2019). We use Adam optimizer (Kingma and Ba 2015) with a learning rate of 5e-4. The model is trained on a NVIDIA Tesla T4 GPU card, and each mini-batch contains 4096 tokens. The hyper-parameter α is set as 0.5. The dataset is split as training/valid/test set with a ratio of 8:1:1.