Musical Composition Style Transfer via Disentangled Timbre Representations
Authors: Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle. |
| Researcher Affiliation | Collaboration | Yun-Ning Hung1 , I-Tung Chiang1 , Yi-An Chen2 and Yi-Hsuan Yang1 1Research Center for IT Innovation, Academia Sinica, Taiwan 2KKBOX Inc., Taiwan |
| Pseudocode | No | The paper describes the architecture of the proposed models and their components, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle. |
| Open Datasets | Yes | We use the newest released Muse Score dataset [Hung et al., 2019] to train the proposed models. This dataset contains 344,166 paired MIDI and MP3 files. |
| Dataset Splits | Yes | We train additional instrument classifiers D t with the pre-defined training split of the M&M dataset (200 songs) [Hung et al., 2019]. We use the estimate of D t as the predicted instrument roll. Table 1 shows the evaluation result on the pre-defined test split of the M&M dataset (69 songs) [Hung et al., 2019] of four state-of-the-art models (i.e., the first four rows) and our models (the middle two rows), considering only the five most popular instruments as [Hung et al., 2019]. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using 'librosa library' and 'pypianoroll package' but does not specify their version numbers, nor any other software dependencies with versions (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The initial learning rate is set to 0.005. We compute CQT with the librosa library [Mc Fee et al., 2015], with 16,000 Hz sampling rate and 512-sample window size, again with no overlaps. We use a frequency scale of 88 bins, with 12 bins per octave to represent each note. Hence, F = 88 (bins) and T = 312 (frames). Both Duo ED and Unet ED are trained using stochastic gradient descend with momentum 0.9. |