MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. |
| Researcher Affiliation | Collaboration | 1Mila, Quebec Artificial Intelligence Institute, Universit e de Montr eal, 2Northwestern University, 3New York University, 4Google Brain wu.yusong@mila.quebec {emanilow, annahuang, jesseengel}@google.com |
| Pseudocode | No | No explicitly labeled 'Pseudocode' or 'Algorithm' blocks were found. |
| Open Source Code | Yes | Online resources: Code: https://github.com/magenta/midi-ddsp |
| Open Datasets | Yes | To demonstrate modeling a variety of instruments, we use the URMP dataset (Li et al., 2018), a publicly-available audio dataset containing monophonic solo performances of a variety of instruments. |
| Dataset Splits | No | The URMP dataset contains 3.75 hours of 117 unique solo recordings, where 85 recordings in 3 hours are used as the training set, and 35 recordings in 0.75 hours are used as the hold-out test set." and "We use solo recordings in piece number [3, 9, 11, 21, 24, 25, 33, 38, 39, 40, 41, 43] in URMP as test set, and the rest of distinct solo recordings in URMP as training set, as there are repeat use of solo recordings among different pieces. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions several models and optimizers (e.g., CREPE, Adam optimizer, MelGAN structures) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The Expression Generator, Synthesis Generator, and DDSP Inference Module are trained separately. The Expression Generator is trained for 5000 steps, the Synthesis Generator is trained for 40000 steps, and the DDSP Inference Module is trained for 10000 steps. The DDSP Inference Module is optimized via Adam optimizer in a batch size of 16 and a learning rate of 3e 4. The Synthesis Generator is optimized via Adam optimizer in a batch size of 16 and a learning rate of 0.0003, with an exponential learning rate decay at a rate of 0.99 per 1000 steps. The discriminator is optimized using Adam optimizer in a batch size of 16 and a learning rate of 0.0001. α = 1, β = 1, and γ = 10 are used for loss coefficients. The Expression Generator is trained on a sequence length of 64 notes and a batch size of 256. Adam optimizer (Kingma & Ba, 2014) is used in training with a learning rate of 0.0001. |