Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling

Authors: Jingwei Zhao, Gus Xia, Ziyu Wang, Ye Wang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of our multi-track accompaniment system. Given that existing methods primarily focus on lead sheet to multi-track arrangement, we ensure a fair comparison by using the two-stage approach discussed in Section 4. In Section 5.1, we present the datasets used and the training details of our model. In Section 5.2, we describe the baseline models used for comparison. Our evaluation is divided into two parts: objective evaluation, detailed in Section 5.3, and subjective evaluation, covered in Section 5.4. For the single-stage piano to multi-track (Stage 2) and lead sheet to piano (Stage 1) arrangement tasks, we perform additional comparisons with various ablation architectures in Section 5.5 and 5.6, respectively.
Researcher Affiliation	Academia	Jingwei Zhao1,3 Gus Xia4,5 Ziyu Wang5,4 Ye Wang2,1,3 1Institute of Data Science, NUS 2School of Computing, NUS 3Integrative Sciences and Engineering Programme, NUS Graduate School 4Machine Learning Department, MBZUAI 5Computer Science Department, NYU Shanghai
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks clearly labeled as such.
Open Source Code	Yes	We release our code and more resources at https://github.com/zhaojw1998/Structured-Arrangement-Code.
Open Datasets	Yes	We use two datasets to train the autoencoder and the style prior, respectively. The autoencoder is trained with Slahk2100 [25], which contains 2K curated multi-track pieces with 34 instrument classes in a balanced distribution. [...] We use Lakh MIDI Dataset (LMD) [28] to train the prior model. It contains 170k music pieces and is a benchmark dataset for training music generative models.
Dataset Splits	Yes	We use the official training split and augment training samples by transposing to all 12 keys. [...] We collect 2/4 and 4/4 pieces (110k after processing) and randomly split LMD at song level into training (95%) and validation (5%) sets.
Hardware Specification	Yes	The autoencoder comprises 19M learnable parameters and is trained with batch size 64 for 30 epochs on an RTX A5000 GPU with 24GB memory. [...] Our prior model has 30M parameters and is trained with batch size 16 for 10 epochs (600K iterations) on two RTX A5000 GPUs.
Software Dependencies	No	The paper mentions using "Adam optimizer [19]" and "Adam W optimizer [22]" but does not specify version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup	Yes	The autoencoder comprises 19M learnable parameters and is trained with batch size 64 for 30 epochs on an RTX A5000 GPU with 24GB memory. We use Adam optimizer [19] with a learning rate from 1e-3 exponentially decayed to 1e-5. We use exponential moving average (EMA) [29] and random restart [7] to update the codebook with commitment ratio β = 0.25. Our prior model has 30M parameters and is trained with batch size 16 for 10 epochs (600K iterations) on two RTX A5000 GPUs. We apply Adam W optimizer [22] with a learning rate of 1e-4, scheduled by a 1k-step linear warm-up followed by a single cycle of cosine decay to a final rate of 1e-6.