The Cone of Silence: Speech Separation by Localization
Authors: Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise. |
| Researcher Affiliation | Academia | Teerapat Jenrungrot Vivek Jayaram Steve Seitz Ira Kemelmacher-Shlizerman University of Washington {tjenrung, vjayaram, seitz, kemelmi}@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1: Separation by Localization via Binary Search |
| Open Source Code | No | The paper provides links to a project website (https://grail.cs.washington.edu/projects/cone-of-silence/) for audio demos and supplementary videos, and mentions "supplementary results," but does not explicitly state that the source code for the methodology is available or provide a direct link to a code repository. |
| Open Datasets | Yes | All voices come from the VCTK dataset [57], and the background samples consist of recordings from either noisy restaurant environments or loud music. ... We chose VCTK over other widely used datasets like Libri Speech [58] and WSJ0 [59] because VCTK is available at a high sampling rate of 48 k Hz compared to 16 k Hz as offered by others. |
| Dataset Splits | No | The paper states, "For training our network, we use 10,000 examples with N chosen uniformly between 1 and 4, inclusively, at random, and for evaluating we use 1,000 examples with N dependent on the evaluation task." While it specifies training and evaluation (test) set sizes, it does not explicitly mention a separate validation set or its size. |
| Hardware Specification | No | The paper mentions that "A forward pass of the network on a single GPU takes 0.03 s for a 3 s input waveform at 44.1 k Hz" but does not provide specific details about the GPU model, CPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions software components such as the "pyroomacoustics library [61]" and refers to the "Demucs architecture [53]" and "Wave Net [56]" but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | In our experiments, we use h of size 5 corresponding to window sizes from the set {90 , 45 , 23 , 12 , 2 }. ... For training our network, we use 10,000 examples... All signals are convolved with the corresponding RIRs and rendered to a 6-channel circular microphone array (M = 6) of radius 2.85 in (7.25 cm). ... To synthesize a single example, we create a 3-second mixture at 44.1 k Hz... |