reproducibilityindex.ai

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Authors: Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS", "Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-ﬁdelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence.
Researcher Affiliation	Collaboration	1Mila, Quebec Artiﬁcial Intelligence Institute, Universit e de Montr eal, 2Northwestern University, 3New York University, 4Google Brain wu.yusong@mila.quebec {emanilow, annahuang, jesseengel}@google.com
Pseudocode	No	No explicitly labeled 'Pseudocode' or 'Algorithm' blocks were found.
Open Source Code	Yes	Online resources: Code: https://github.com/magenta/midi-ddsp
Open Datasets	Yes	To demonstrate modeling a variety of instruments, we use the URMP dataset (Li et al., 2018), a publicly-available audio dataset containing monophonic solo performances of a variety of instruments.
Dataset Splits	No	The URMP dataset contains 3.75 hours of 117 unique solo recordings, where 85 recordings in 3 hours are used as the training set, and 35 recordings in 0.75 hours are used as the hold-out test set." and "We use solo recordings in piece number [3, 9, 11, 21, 24, 25, 33, 38, 39, 40, 41, 43] in URMP as test set, and the rest of distinct solo recordings in URMP as training set, as there are repeat use of solo recordings among different pieces.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions several models and optimizers (e.g., CREPE, Adam optimizer, MelGAN structures) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The Expression Generator, Synthesis Generator, and DDSP Inference Module are trained separately. The Expression Generator is trained for 5000 steps, the Synthesis Generator is trained for 40000 steps, and the DDSP Inference Module is trained for 10000 steps. The DDSP Inference Module is optimized via Adam optimizer in a batch size of 16 and a learning rate of 3e 4. The Synthesis Generator is optimized via Adam optimizer in a batch size of 16 and a learning rate of 0.0003, with an exponential learning rate decay at a rate of 0.99 per 1000 steps. The discriminator is optimized using Adam optimizer in a batch size of 16 and a learning rate of 0.0001. α = 1, β = 1, and γ = 10 are used for loss coefﬁcients. The Expression Generator is trained on a sequence length of 64 notes and a batch size of 256. Adam optimizer (Kingma & Ba, 2014) is used in training with a learning rate of 0.0001.