Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
S²MILE: Semantic-and-Structure-Aware Music-Driven Lyric Generation
Authors: Mu You, Fang Zhang, Shuai Zhang, Linli Xu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on objective and subjective benchmarks demonstrate the capabilities of our proposed model in capturing semantics and generating well-formatted lyrics. |
| Researcher Affiliation | Academia | Mu You1,2, Fang Zhang1,2, Shuai Zhang1, Linli Xu1,2* 1School of Computer Science and Technology, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence EMAIL, EMAIL |
| Pseudocode | No | The paper includes figures illustrating the model architecture (Figure 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper mentions other models like Song MASS and Song Composer as 'open-source', but there is no explicit statement or link provided for the authors' own described methodology or code. |
| Open Datasets | Yes | In this section, we conduct our experiments on the Meta MIDI dataset (Ens and Pasquier 2021), which comprises 436,631 tracks in MIDI format. |
| Dataset Splits | Yes | Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets. |
| Hardware Specification | Yes | Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets. |
| Software Dependencies | No | The paper mentions several models and techniques (HTS-AT, BERT, T5-base, Mistral-7B, LoRA, Adam) but does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | Song-level Extractor: 1 10 5, batch size 42, processing up to 24 audio tokens and 108 text tokens per batch. Sentence-level Extractor: 1 10 4, batch size 12, processing 256 musical notes and 32 text tokens per batch. Lyric Length Predictor: 1 10 4, batch size 12, processing up to 256 musical notes per batch. Lyric Generator: 2 10 4, batch size 2, with 84 gradient accumulation steps. We optimize modules with Adam (Kingma and Ba 2015) |