Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli Shama Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a wide range of symbolic music generation tasks: unconditional generation (Sec 6.2), individual rule guidance (Sec 6.3), composite rule guidance (Sec 6.4) and editing (Appendix C.2). We perform ablation studies in Sec 6.5 and subjective evaluation in Sec 6.6. |
| Researcher Affiliation | Collaboration | 1California Institute of Technology 2Rensselaer Polytechnic Institute 3NVIDIA 4Dalhousie University 5Vector Institute. |
| Pseudocode | Yes | Algorithm 1 Stochastic Control Guided DDPM sampling, Algorithm 3 Stochastic Control Guided stochastic DDIM sampling |
| Open Source Code | Yes | For detailed demonstrations, code and model checkpoints, please visit our project website. |
| Open Datasets | Yes | We train our model on several piano midi datasets that cover both classical and pop genres. The MAESTRO dataset (Hawthorne et al., 2019) has about 1200 pieces of classical piano performances... In addition, we crawled about 14k MIDI files from Muscore... We also used two Pop piano datasets: Pop1k7 (Hsiao et al., 2021) and Pop909 (Wang et al., 2020)... |
| Dataset Splits | No | The paper mentions using specific datasets for training and testing, but does not provide explicit details about the train/validation/test splits (e.g., percentages, counts, or predefined split IDs) needed for reproduction. |
| Hardware Specification | Yes | All experiments are run on NVIDIA A100-SXM4 GPUs. |
| Software Dependencies | No | The paper mentions specific software components like Adam W (Loshchilov & Hutter, 2019) and the Di T architecture (Peebles & Xie, 2023), but it does not specify explicit version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The model was trained over 800k steps. The KL scheduler, λKL(t), was a linear scheduler increasing from 0 to 1e-2 across 400k steps. The denoising scheduler, λdenoise, linearly increased the perturbation fraction from 0 to 25% over 400k steps. We employed a cosine learning rate scheduler with a 10k-step warmup, peaking at a learning rate of 5e-4. The optimizer used was Adam W (Loshchilov & Hutter, 2019), with weight decay of 0.01 and a batch size of 80. |