Learning to Separate Voices by Spatial Regions
Authors: Alan Xu, Romit Roy Choudhury
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results show promising performance, underscoring the importance of personalization over a generic supervised approach. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Illinois, US. |
| Pseudocode | Yes | Algorithm 1 presents the pseudo code; we explain the key steps below. |
| Open Source Code | No | The paper mentions "audio samples available at our project website1). 1https://uiuc-earable-computing.github. io/binaural/", but this link is specified for audio samples, not for the source code of the methodology. There is no explicit statement about releasing the source code. |
| Open Datasets | Yes | For supervised region-based separation, we use the CIPIC HRTF database (Algazi et al., 2001). ... We use the Libri Mix dataset (Cosentino et al., 2020), sampled at 16KHz... |
| Dataset Splits | Yes | With the script used in (Dovrat et al., 2021), Libri5Mix is used for training and validation, while Libri2Mix, Libri3Mix, Libri4Mix, and Libri5Mix are used for testing. |
| Hardware Specification | Yes | The model is trained on 4 1080ti GPUs using the ADAM optimizer with batch size 4. |
| Software Dependencies | No | The paper mentions software components like "ADAM optimizer", "STFT", "Hanning window", and refers to "feature concatenation Tas Net" but does not specify version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | To configure the feature concatenation Tas Net, we set N = 512, L = 32, B = 128, Sc = 128, P = 3, X = 8, R = 3, following the convention in (Luo & Mesgarani, 2019). ... We set faliasing = 562Hz, which is about the 36th bin in the FFT. We set α = 5, σth = 0.00007 second this value was set empirically based on our discussion on Figure 4. |