reproducibilityindex.ai

Musical Composition Style Transfer via Disentangled Timbre Representations

Authors: Yun-Ning Hung, I-Tung Chiang, Yi-An Chen, Yi-Hsuan Yang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of the models by experiments on instrument activity detection and composition style transfer. To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle.
Researcher Affiliation	Collaboration	Yun-Ning Hung1 , I-Tung Chiang1 , Yi-An Chen2 and Yi-Hsuan Yang1 1Research Center for IT Innovation, Academia Sinica, Taiwan 2KKBOX Inc., Taiwan
Pseudocode	No	The paper describes the architecture of the proposed models and their components, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	To facilitate follow-up research, we open source our code at https://github. com/biboamy/instrument-disentangle.
Open Datasets	Yes	We use the newest released Muse Score dataset [Hung et al., 2019] to train the proposed models. This dataset contains 344,166 paired MIDI and MP3 ﬁles.
Dataset Splits	Yes	We train additional instrument classiﬁers D t with the pre-deﬁned training split of the M&M dataset (200 songs) [Hung et al., 2019]. We use the estimate of D t as the predicted instrument roll. Table 1 shows the evaluation result on the pre-deﬁned test split of the M&M dataset (69 songs) [Hung et al., 2019] of four state-of-the-art models (i.e., the ﬁrst four rows) and our models (the middle two rows), considering only the ﬁve most popular instruments as [Hung et al., 2019].
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper mentions using 'librosa library' and 'pypianoroll package' but does not specify their version numbers, nor any other software dependencies with versions (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The initial learning rate is set to 0.005. We compute CQT with the librosa library [Mc Fee et al., 2015], with 16,000 Hz sampling rate and 512-sample window size, again with no overlaps. We use a frequency scale of 88 bins, with 12 bins per octave to represent each note. Hence, F = 88 (bins) and T = 312 (frames). Both Duo ED and Unet ED are trained using stochastic gradient descend with momentum 0.9.