Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data

Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov4441-4449

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint Audio Set. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.
Researcher Affiliation Collaboration Ke Chen1*, Xingjian Du2*, Bilei Zhu2, Zejun Ma2, Taylor Berg-Kirkpatrick1, Shlomo Dubnov1 1 University of California San Diego, CA, USA 2 Bytedance AI Lab, Shanghai, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The official code is available in https://git.io/JDWQ5
Open Datasets Yes We choose Audio Set to train our sound event detection system ST-SED. It is a largescale collection of over 2 million 10-sec audio samples and labeled with sound events from a set of 527 labels. (...) MUSDB18 (Rafii et al. 2017) (...) Audio Set (Gemmeke et al. 2017) (...) DESED test set (Serizel et al. 2020)
Dataset Splits Yes We train our audio separator in Audio Set full-train set, validate it in Audioset evaluation set, and evaluate it in MUSDB18 test set as following the 6th community-based Signal Separation Evaluation Campaign (Si SEC 2018). (...) During the validation stage, we follow the same sampling paradigm to construct 5096 audio pairs from Audioset evaluation set and fix these pairs.
Hardware Specification Yes We implement the ST-SED in Py Torch, train it with a batch size of 128 and the Adam W optimizer (...) in 8 NVIDIA Tesla V-100 GPUs in parallel.
Software Dependencies No The paper mentions "Py Torch" and "Adam W optimizer" but does not provide specific version numbers for these software components.
Experiment Setup Yes We implement the ST-SED in Py Torch, train it with a batch size of 128 and the Adam W optimizer (β1=0.9, β2=0.999, eps=1e-8, decay=0.05) (...) We adopt a warmup schedule by setting the learning rate as 0.05, 0.1, 0.2 in the first three epochs, then the learning rate is halved every ten epochs until it returns to 0.05.