Music Style Transfer with Time-Varying Inversion of Diffusion Models

Authors: Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming Dong, Changsheng Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and source code are available at https://lsfhuihuiff.github.io/Music TI/. We conducted qualitative evaluation, quantitative evaluation and ablation study to demonstrate the effectiveness of our method, which performs well in both content preservation and style fit.
Researcher Affiliation Collaboration Sifei Li1,2, Yuxin Zhang1,2, Fan Tang3, Chongyang Ma4, Weiming Dong1,2*, Changsheng Xu1,2 1MAIS, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Institute of Computing Technology, Chinese Academy of Sciences 4Kuaishou Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Samples and source code are available at https://lsfhuihuiff.github.io/Music TI/.
Open Datasets Yes We collected a small-scale dataset from a website (https://pixabay.com) where all the content is free for use.
Dataset Splits No The paper describes the total number of clips and their categories (style/content) but does not provide specific train, validation, or test dataset splits.
Hardware Specification Yes The training process on each style takes approximately 30 minutes using an NVIDIA Ge Force RTX3090 with a batch size of 1, less than the more than 60 minutes required for TI.
Software Dependencies No The paper mentions software components and models like Riffusion, LDMs, CLIP, DDIM, VAE, and Griffin-Lim, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use the default hyperparameters of LDMs and set a base learning rate of 0.001. ... During inference, our approach employs two hyperparameters: strength and scale. ... We achieved the best results when strength ranged from 0.6 to 0.7 and the scale ranged from 3.0 to 5.0.