Musical Composition Style Transfer via Disentangled Timbre Representations

Authors: Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle.
Researcher Affiliation Collaboration Yun-Ning Hung1 , I-Tung Chiang1 , Yi-An Chen2 and Yi-Hsuan Yang1 1Research Center for IT Innovation, Academia Sinica, Taiwan 2KKBOX Inc., Taiwan
Pseudocode No The paper describes the architecture of the proposed models and their components, but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle.
Open Datasets Yes We use the newest released Muse Score dataset [Hung et al., 2019] to train the proposed models. This dataset contains 344,166 paired MIDI and MP3 files.
Dataset Splits Yes We train additional instrument classifiers D t with the pre-defined training split of the M&M dataset (200 songs) [Hung et al., 2019]. We use the estimate of D t as the predicted instrument roll. Table 1 shows the evaluation result on the pre-defined test split of the M&M dataset (69 songs) [Hung et al., 2019] of four state-of-the-art models (i.e., the first four rows) and our models (the middle two rows), considering only the five most popular instruments as [Hung et al., 2019].
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using 'librosa library' and 'pypianoroll package' but does not specify their version numbers, nor any other software dependencies with versions (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The initial learning rate is set to 0.005. We compute CQT with the librosa library [Mc Fee et al., 2015], with 16,000 Hz sampling rate and 512-sample window size, again with no overlaps. We use a frequency scale of 88 bins, with 12 bins per octave to represent each note. Hence, F = 88 (bins) and T = 312 (frames). Both Duo ED and Unet ED are trained using stochastic gradient descend with momentum 0.9.