Dilated Convolution with Dilated GRU for Music Source Separation

Authors: Jen-Yu Liu, Yi-Hsuan Yang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to verify the capability of the proposed model, as well as the relative importance of its components. We also investigate how the D2 blocks work. Our evaluation shows that our model (GRU dilation=1) outperforms the state-of-the-art models for separating vocals and accompaniments.
Researcher Affiliation Academia Jen-Yu Liu and Yi-Hsuan Yang Research Center for IT Innovation, Academia Sinica jenyuliu.tw@gmail.com, yang@citi.sinica.edu.tw
Pseudocode No The paper provides mathematical equations and network diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We evaluate the proposed model for music source separation on the MUSDB18 dataset2 used in Si SEC2018 [St oter et al., 2018]. MUSDB18 contains 100 songs for training and 50 songs for evaluation, all with 44,100 Hz sampling rate. 2https://sigsep.github.io/datasets/musdb.html
Dataset Splits Yes 1/10 of the training set is used for validation.
Hardware Specification No The paper does not specify any details about the hardware (e.g., CPU, GPU models) used for the experiments.
Software Dependencies No The paper mentions PyTorch and Adam optimizer but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Mean square error is used as the loss function for updating the network, and Adam [Kingma and Ba, 2015] is used to update the weights. ... Batch sizes 20 and 5 are used for 5-sec and 20-sec training respectively so that they have roughly the same number of weight updating in training. ... The weights in the epoch with the best validation loss during 500-epoch training are kept for a model. ... complex spectrograms are derived by applying a short-term Fourier transform (STFT) to waveforms, with 4,096-sample window size and 3/4 overlapping.