MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS", "Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence.
Researcher Affiliation Collaboration 1Mila, Quebec Artificial Intelligence Institute, Universit e de Montr eal, 2Northwestern University, 3New York University, 4Google Brain wu.yusong@mila.quebec {emanilow, annahuang, jesseengel}@google.com
Pseudocode No No explicitly labeled 'Pseudocode' or 'Algorithm' blocks were found.
Open Source Code Yes Online resources: Code: https://github.com/magenta/midi-ddsp
Open Datasets Yes To demonstrate modeling a variety of instruments, we use the URMP dataset (Li et al., 2018), a publicly-available audio dataset containing monophonic solo performances of a variety of instruments.
Dataset Splits No The URMP dataset contains 3.75 hours of 117 unique solo recordings, where 85 recordings in 3 hours are used as the training set, and 35 recordings in 0.75 hours are used as the hold-out test set." and "We use solo recordings in piece number [3, 9, 11, 21, 24, 25, 33, 38, 39, 40, 41, 43] in URMP as test set, and the rest of distinct solo recordings in URMP as training set, as there are repeat use of solo recordings among different pieces.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory) used for running experiments were mentioned.
Software Dependencies No The paper mentions several models and optimizers (e.g., CREPE, Adam optimizer, MelGAN structures) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The Expression Generator, Synthesis Generator, and DDSP Inference Module are trained separately. The Expression Generator is trained for 5000 steps, the Synthesis Generator is trained for 40000 steps, and the DDSP Inference Module is trained for 10000 steps. The DDSP Inference Module is optimized via Adam optimizer in a batch size of 16 and a learning rate of 3e 4. The Synthesis Generator is optimized via Adam optimizer in a batch size of 16 and a learning rate of 0.0003, with an exponential learning rate decay at a rate of 0.99 per 1000 steps. The discriminator is optimized using Adam optimizer in a batch size of 16 and a learning rate of 0.0001. α = 1, β = 1, and γ = 10 are used for loss coefficients. The Expression Generator is trained on a sequence length of 64 notes and a batch size of 256. Adam optimizer (Kingma & Ba, 2014) is used in training with a learning rate of 0.0001.