The Cone of Silence: Speech Separation by Localization

Authors: Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.
Researcher Affiliation Academia Teerapat Jenrungrot Vivek Jayaram Steve Seitz Ira Kemelmacher-Shlizerman University of Washington {tjenrung, vjayaram, seitz, kemelmi}@cs.washington.edu
Pseudocode Yes Algorithm 1: Separation by Localization via Binary Search
Open Source Code No The paper provides links to a project website (https://grail.cs.washington.edu/projects/cone-of-silence/) for audio demos and supplementary videos, and mentions "supplementary results," but does not explicitly state that the source code for the methodology is available or provide a direct link to a code repository.
Open Datasets Yes All voices come from the VCTK dataset [57], and the background samples consist of recordings from either noisy restaurant environments or loud music. ... We chose VCTK over other widely used datasets like Libri Speech [58] and WSJ0 [59] because VCTK is available at a high sampling rate of 48 k Hz compared to 16 k Hz as offered by others.
Dataset Splits No The paper states, "For training our network, we use 10,000 examples with N chosen uniformly between 1 and 4, inclusively, at random, and for evaluating we use 1,000 examples with N dependent on the evaluation task." While it specifies training and evaluation (test) set sizes, it does not explicitly mention a separate validation set or its size.
Hardware Specification No The paper mentions that "A forward pass of the network on a single GPU takes 0.03 s for a 3 s input waveform at 44.1 k Hz" but does not provide specific details about the GPU model, CPU, or other hardware components used for the experiments.
Software Dependencies No The paper mentions software components such as the "pyroomacoustics library [61]" and refers to the "Demucs architecture [53]" and "Wave Net [56]" but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes In our experiments, we use h of size 5 corresponding to window sizes from the set {90 , 45 , 23 , 12 , 2 }. ... For training our network, we use 10,000 examples... All signals are convolved with the corresponding RIRs and rendered to a 6-channel circular microphone array (M = 6) of radius 2.85 in (7.25 cm). ... To synthesize a single example, we create a 3-second mixture at 44.1 k Hz...