Music Style Transfer with Time-Varying Inversion of Diffusion Models
Authors: Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming Dong, Changsheng Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and source code are available at https://lsfhuihuiff.github.io/Music TI/. We conducted qualitative evaluation, quantitative evaluation and ablation study to demonstrate the effectiveness of our method, which performs well in both content preservation and style fit. |
| Researcher Affiliation | Collaboration | Sifei Li1,2, Yuxin Zhang1,2, Fan Tang3, Chongyang Ma4, Weiming Dong1,2*, Changsheng Xu1,2 1MAIS, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Institute of Computing Technology, Chinese Academy of Sciences 4Kuaishou Technology |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Samples and source code are available at https://lsfhuihuiff.github.io/Music TI/. |
| Open Datasets | Yes | We collected a small-scale dataset from a website (https://pixabay.com) where all the content is free for use. |
| Dataset Splits | No | The paper describes the total number of clips and their categories (style/content) but does not provide specific train, validation, or test dataset splits. |
| Hardware Specification | Yes | The training process on each style takes approximately 30 minutes using an NVIDIA Ge Force RTX3090 with a batch size of 1, less than the more than 60 minutes required for TI. |
| Software Dependencies | No | The paper mentions software components and models like Riffusion, LDMs, CLIP, DDIM, VAE, and Griffin-Lim, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use the default hyperparameters of LDMs and set a base learning rate of 0.001. ... During inference, our approach employs two hyperparameters: strength and scale. ... We achieved the best results when strength ranged from 0.6 to 0.7 and the scale ranged from 3.0 to 5.0. |