Play as You Like: Timbre-Enhanced Multi-Modal Music Style Transfer

Authors: Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, Li Su1061-1068

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output. Subjective tests were conducted to evaluate the style transfer system from human s perspective. Table 1 shows the Mean Opinion Scores (MOS) of the listening test collected from 182 responses.
Researcher Affiliation Academia Chien-Yu Lu,1 Min-Xin Xue,1* Chia-Che Chang,1 Che-Rung Lee,1 Li Su2 1Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan 2Institute of Information Science, Academia Sinica, Taipei, Taiwan
Pseudocode No The paper describes the system and methods using text and diagrams (Figure 1, Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Codes and listening examples of this work are announced online at: https://github.com/Chien Yu Lu/Play-As-You-Like Timbre-Enhanced-Multi-modal-Music-Style-Transfer
Open Datasets No The paper describes the data used as 'classical piano solo (Nocturne Complete Works performed by Vladimir Ashkenazy) and classical string quartet (Bruch s Complete String Quartet)' and 'popular piano solo and popular guitar solo (data of both domains consists in 34 piano solos (8,200 seconds) and 56 guitar solos (7,800 seconds) covered by the pianists and guitarists on You Tube. Please see supplementary materials for details)'. While these describe the source of the data, no specific public dataset name, URL, DOI, or formal citation is provided for direct access to the datasets used in the experiments.
Dataset Splits No The paper mentions 'training stage' and 'evaluation' but does not explicitly provide details about specific training, validation, or test dataset splits (e.g., percentages, counts, or cross-validation setup).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions that 'The adopted networks are mostly based on the MUNIT implementation except for the Ra GAN in adversarial training' and 'The model is optimized by adam', but it does not specify version numbers for these software components or libraries.
Experiment Setup Yes The model is optimized by adam, with the batch size being one, and with the learning rate and weight decay rate being both 0.0001. The regularization parameters in (13) and (14) are: λr = 10, λs = λc = 1, and λMFCC = λ = λenv = 1. The sampling rate of music signals is fs = 22.05 k Hz. The window size and hop size for STFT are 2048 and 256 samples, respectively. The dimension of the style code is 8.