Learning to Separate Voices by Spatial Regions

Authors: Alan Xu, Romit Roy Choudhury

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results show promising performance, underscoring the importance of personalization over a generic supervised approach.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Illinois, US.
Pseudocode Yes Algorithm 1 presents the pseudo code; we explain the key steps below.
Open Source Code No The paper mentions "audio samples available at our project website1). 1https://uiuc-earable-computing.github. io/binaural/", but this link is specified for audio samples, not for the source code of the methodology. There is no explicit statement about releasing the source code.
Open Datasets Yes For supervised region-based separation, we use the CIPIC HRTF database (Algazi et al., 2001). ... We use the Libri Mix dataset (Cosentino et al., 2020), sampled at 16KHz...
Dataset Splits Yes With the script used in (Dovrat et al., 2021), Libri5Mix is used for training and validation, while Libri2Mix, Libri3Mix, Libri4Mix, and Libri5Mix are used for testing.
Hardware Specification Yes The model is trained on 4 1080ti GPUs using the ADAM optimizer with batch size 4.
Software Dependencies No The paper mentions software components like "ADAM optimizer", "STFT", "Hanning window", and refers to "feature concatenation Tas Net" but does not specify version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes To configure the feature concatenation Tas Net, we set N = 512, L = 32, B = 128, Sc = 128, P = 3, X = 8, R = 3, following the convention in (Luo & Mesgarani, 2019). ... We set faliasing = 562Hz, which is about the 36th bin in the FFT. We set α = 5, σth = 0.00007 second this value was set empirically based on our discussion on Figure 4.