Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
S²MILE: Semantic-and-Structure-Aware Music-Driven Lyric Generation
Authors: Mu You, Fang Zhang, Shuai Zhang, Linli Xu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on objective and subjective benchmarks demonstrate the capabilities of our proposed model in capturing semantics and generating well-formatted lyrics. |
| Researcher Affiliation | Academia | Mu You1,2, Fang Zhang1,2, Shuai Zhang1, Linli Xu1,2* 1School of Computer Science and Technology, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence EMAIL, EMAIL |
| Pseudocode | No | The paper includes figures illustrating the model architecture (Figure 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper mentions other models like Song MASS and Song Composer as 'open-source', but there is no explicit statement or link provided for the authors' own described methodology or code. |
| Open Datasets | Yes | In this section, we conduct our experiments on the Meta MIDI dataset (Ens and Pasquier 2021), which comprises 436,631 tracks in MIDI format. |
| Dataset Splits | Yes | Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets. |
| Hardware Specification | Yes | Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets. |
| Software Dependencies | No | The paper mentions several models and techniques (HTS-AT, BERT, T5-base, Mistral-7B, LoRA, Adam) but does not provide specific version numbers for any software libraries or frameworks used. |
| Experiment Setup | Yes | Song-level Extractor: 1 10 5, batch size 42, processing up to 24 audio tokens and 108 text tokens per batch. Sentence-level Extractor: 1 10 4, batch size 12, processing 256 musical notes and 32 text tokens per batch. Lyric Length Predictor: 1 10 4, batch size 12, processing up to 256 musical notes per batch. Lyric Generator: 2 10 4, batch size 2, with 84 gradient accumulation steps. We optimize modules with Adam (Kingma and Ba 2015) |