DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Authors: Zachary Novack, Julian Mcauley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate a surprisingly widerange of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimizationbased methods, we find DITTO achieves state-ofthe-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at https: //ditto-music.github.io/web/. and 5. Experimental Design
Researcher Affiliation Collaboration 1University of California San Diego 2Adobe Research.
Pseudocode Yes Algorithm 1 Diffusion Inference-Time T -Optimization (DITTO)
Open Source Code No The paper mentions 'Sound examples can be found at https: //ditto-music.github.io/web/.' However, the website itself states 'Code (Coming Soon)', indicating the code is not yet publicly available.
Open Datasets No The paper states 'We train our models on a dataset of 1800 hours of licensed instrumental music with genre, mood, and tempo tags.' No specific link, DOI, or formal citation for this custom training dataset is provided to indicate its public availability.
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits with percentages, sample counts, or citations to predefined splits.
Hardware Specification Yes pre-train with distributed data parallel for 5 days on 32 A100 GPUs with a batch size of 24 per GPU. and trained on 8 A100 GPUs for 5 days.
Software Dependencies No The paper mentions software like 'Adam optimizer' and 'UNet', but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA, specific libraries).
Experiment Setup Yes We use Adam (Kingma & Ba, 2014) as our optimizer for DITTO, with a learning rate of 5 10 3 (as higher leads to stability issues). We use DDIM (Song et al., 2020) sampling with 20 steps and dynamic thresholding (Saharia et al., 2022b) for all experiments. No optimizer hyperparameters were changed across application besides the max number of optimization steps, which were doubled from 70 to 150 for the melody and structure tasks.