Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

S²MILE: Semantic-and-Structure-Aware Music-Driven Lyric Generation

Authors: Mu You, Fang Zhang, Shuai Zhang, Linli Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on objective and subjective benchmarks demonstrate the capabilities of our proposed model in capturing semantics and generating well-formatted lyrics.
Researcher Affiliation Academia Mu You1,2, Fang Zhang1,2, Shuai Zhang1, Linli Xu1,2* 1School of Computer Science and Technology, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence EMAIL, EMAIL
Pseudocode No The paper includes figures illustrating the model architecture (Figure 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided.
Open Source Code No The paper mentions other models like Song MASS and Song Composer as 'open-source', but there is no explicit statement or link provided for the authors' own described methodology or code.
Open Datasets Yes In this section, we conduct our experiments on the Meta MIDI dataset (Ens and Pasquier 2021), which comprises 436,631 tracks in MIDI format.
Dataset Splits Yes Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets.
Hardware Specification Yes Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets.
Software Dependencies No The paper mentions several models and techniques (HTS-AT, BERT, T5-base, Mistral-7B, LoRA, Adam) but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup Yes Song-level Extractor: 1 10 5, batch size 42, processing up to 24 audio tokens and 108 text tokens per batch. Sentence-level Extractor: 1 10 4, batch size 12, processing 256 musical notes and 32 text tokens per batch. Lyric Length Predictor: 1 10 4, batch size 12, processing up to 256 musical notes per batch. Lyric Generator: 2 10 4, batch size 2, with 84 gradient accumulation steps. We optimize modules with Adam (Kingma and Ba 2015)