reproducibilityindex.ai

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Authors: Zachary Novack, Julian Mcauley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate a surprisingly widerange of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control all without ever ﬁne-tuning the underlying model. When we compare our approach against related training, guidance, and optimizationbased methods, we ﬁnd DITTO achieves state-ofthe-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efﬁciency, thus opening the door for high-quality, ﬂexible, training-free control of diffusion models. Sound examples can be found at https: //ditto-music.github.io/web/. and 5. Experimental Design
Researcher Affiliation	Collaboration	1University of California San Diego 2Adobe Research.
Pseudocode	Yes	Algorithm 1 Diffusion Inference-Time T -Optimization (DITTO)
Open Source Code	No	The paper mentions 'Sound examples can be found at https: //ditto-music.github.io/web/.' However, the website itself states 'Code (Coming Soon)', indicating the code is not yet publicly available.
Open Datasets	No	The paper states 'We train our models on a dataset of 1800 hours of licensed instrumental music with genre, mood, and tempo tags.' No specific link, DOI, or formal citation for this custom training dataset is provided to indicate its public availability.
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification	Yes	pre-train with distributed data parallel for 5 days on 32 A100 GPUs with a batch size of 24 per GPU. and trained on 8 A100 GPUs for 5 days.
Software Dependencies	No	The paper mentions software like 'Adam optimizer' and 'UNet', but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA, specific libraries).
Experiment Setup	Yes	We use Adam (Kingma & Ba, 2014) as our optimizer for DITTO, with a learning rate of 5 10 3 (as higher leads to stability issues). We use DDIM (Song et al., 2020) sampling with 20 steps and dynamic thresholding (Saharia et al., 2022b) for all experiments. No optimizer hyperparameters were changed across application besides the max number of optimization steps, which were doubled from 70 to 150 for the melody and structure tasks.