Play as You Like: Timbre-Enhanced Multi-Modal Music Style Transfer
Authors: Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, Li Su1061-1068
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output. Subjective tests were conducted to evaluate the style transfer system from human s perspective. Table 1 shows the Mean Opinion Scores (MOS) of the listening test collected from 182 responses. |
| Researcher Affiliation | Academia | Chien-Yu Lu,1 Min-Xin Xue,1* Chia-Che Chang,1 Che-Rung Lee,1 Li Su2 1Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan 2Institute of Information Science, Academia Sinica, Taipei, Taiwan |
| Pseudocode | No | The paper describes the system and methods using text and diagrams (Figure 1, Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes and listening examples of this work are announced online at: https://github.com/Chien Yu Lu/Play-As-You-Like Timbre-Enhanced-Multi-modal-Music-Style-Transfer |
| Open Datasets | No | The paper describes the data used as 'classical piano solo (Nocturne Complete Works performed by Vladimir Ashkenazy) and classical string quartet (Bruch s Complete String Quartet)' and 'popular piano solo and popular guitar solo (data of both domains consists in 34 piano solos (8,200 seconds) and 56 guitar solos (7,800 seconds) covered by the pianists and guitarists on You Tube. Please see supplementary materials for details)'. While these describe the source of the data, no specific public dataset name, URL, DOI, or formal citation is provided for direct access to the datasets used in the experiments. |
| Dataset Splits | No | The paper mentions 'training stage' and 'evaluation' but does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages, counts, or cross-validation setup). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions that 'The adopted networks are mostly based on the MUNIT implementation except for the Ra GAN in adversarial training' and 'The model is optimized by adam', but it does not specify version numbers for these software components or libraries. |
| Experiment Setup | Yes | The model is optimized by adam, with the batch size being one, and with the learning rate and weight decay rate being both 0.0001. The regularization parameters in (13) and (14) are: λr = 10, λs = λc = 1, and λMFCC = λ = λenv = 1. The sampling rate of music signals is fs = 22.05 k Hz. The window size and hop size for STFT are 2048 and 256 samples, respectively. The dimension of the style code is 8. |