Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a wide range of symbolic music generation tasks: unconditional generation (Sec 6.2), individual rule guidance (Sec 6.3), composite rule guidance (Sec 6.4) and editing (Appendix C.2). We perform ablation studies in Sec 6.5 and subjective evaluation in Sec 6.6.
Researcher Affiliation Collaboration 1California Institute of Technology 2Rensselaer Polytechnic Institute 3NVIDIA 4Dalhousie University 5Vector Institute.
Pseudocode Yes Algorithm 1 Stochastic Control Guided DDPM sampling, Algorithm 3 Stochastic Control Guided stochastic DDIM sampling
Open Source Code Yes For detailed demonstrations, code and model checkpoints, please visit our project website.
Open Datasets Yes We train our model on several piano midi datasets that cover both classical and pop genres. The MAESTRO dataset (Hawthorne et al., 2019) has about 1200 pieces of classical piano performances... In addition, we crawled about 14k MIDI files from Muscore... We also used two Pop piano datasets: Pop1k7 (Hsiao et al., 2021) and Pop909 (Wang et al., 2020)...
Dataset Splits No The paper mentions using specific datasets for training and testing, but does not provide explicit details about the train/validation/test splits (e.g., percentages, counts, or predefined split IDs) needed for reproduction.
Hardware Specification Yes All experiments are run on NVIDIA A100-SXM4 GPUs.
Software Dependencies No The paper mentions specific software components like Adam W (Loshchilov & Hutter, 2019) and the Di T architecture (Peebles & Xie, 2023), but it does not specify explicit version numbers for these or other software dependencies.
Experiment Setup Yes The model was trained over 800k steps. The KL scheduler, λKL(t), was a linear scheduler increasing from 0 to 1e-2 across 400k steps. The denoising scheduler, λdenoise, linearly increased the perturbation fraction from 0 to 25% over 400k steps. We employed a cosine learning rate scheduler with a 10k-step warmup, peaking at a learning rate of 5e-4. The optimizer used was Adam W (Loshchilov & Hutter, 2019), with weight decay of 0.01 and a batch size of 80.