Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

S²MILE: Semantic-and-Structure-Aware Music-Driven Lyric Generation

Authors: Mu You, Fang Zhang, Shuai Zhang, Linli Xu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on objective and subjective benchmarks demonstrate the capabilities of our proposed model in capturing semantics and generating well-formatted lyrics.
Researcher Affiliation	Academia	Mu You1,2, Fang Zhang1,2, Shuai Zhang1, Linli Xu1,2* 1School of Computer Science and Technology, University of Science and Technology of China 2State Key Laboratory of Cognitive Intelligence EMAIL, EMAIL
Pseudocode	No	The paper includes figures illustrating the model architecture (Figure 2, 3, 4) and mathematical equations, but no structured pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper mentions other models like Song MASS and Song Composer as 'open-source', but there is no explicit statement or link provided for the authors' own described methodology or code.
Open Datasets	Yes	In this section, we conduct our experiments on the Meta MIDI dataset (Ens and Pasquier 2021), which comprises 436,631 tracks in MIDI format.
Dataset Splits	Yes	Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets.
Hardware Specification	Yes	Training is conducted on a single NVIDIA RTX 3090 GPU, using an 8:1:1 split for the training, validation, and test sets.
Software Dependencies	No	The paper mentions several models and techniques (HTS-AT, BERT, T5-base, Mistral-7B, LoRA, Adam) but does not provide specific version numbers for any software libraries or frameworks used.
Experiment Setup	Yes	Song-level Extractor: 1 10 5, batch size 42, processing up to 24 audio tokens and 108 text tokens per batch. Sentence-level Extractor: 1 10 4, batch size 12, processing 256 musical notes and 32 text tokens per batch. Lyric Length Predictor: 1 10 4, batch size 12, processing up to 256 musical notes per batch. Lyric Generator: 2 10 4, batch size 2, with 84 gradient accumulation steps. We optimize modules with Adam (Kingma and Ba 2015)