Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Cone of Silence: Speech Separation by Localization
Authors: Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise. |
| Researcher Affiliation | Academia | Teerapat Jenrungrot Vivek Jayaram Steve Seitz Ira Kemelmacher-Shlizerman University of Washington EMAIL |
| Pseudocode | Yes | Algorithm 1: Separation by Localization via Binary Search |
| Open Source Code | No | The paper provides links to a project website (https://grail.cs.washington.edu/projects/cone-of-silence/) for audio demos and supplementary videos, and mentions "supplementary results," but does not explicitly state that the source code for the methodology is available or provide a direct link to a code repository. |
| Open Datasets | Yes | All voices come from the VCTK dataset [57], and the background samples consist of recordings from either noisy restaurant environments or loud music. ... We chose VCTK over other widely used datasets like Libri Speech [58] and WSJ0 [59] because VCTK is available at a high sampling rate of 48 k Hz compared to 16 k Hz as offered by others. |
| Dataset Splits | No | The paper states, "For training our network, we use 10,000 examples with N chosen uniformly between 1 and 4, inclusively, at random, and for evaluating we use 1,000 examples with N dependent on the evaluation task." While it specifies training and evaluation (test) set sizes, it does not explicitly mention a separate validation set or its size. |
| Hardware Specification | No | The paper mentions that "A forward pass of the network on a single GPU takes 0.03 s for a 3 s input waveform at 44.1 k Hz" but does not provide specific details about the GPU model, CPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions software components such as the "pyroomacoustics library [61]" and refers to the "Demucs architecture [53]" and "Wave Net [56]" but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | In our experiments, we use h of size 5 corresponding to window sizes from the set {90 , 45 , 23 , 12 , 2 }. ... For training our network, we use 10,000 examples... All signals are convolved with the corresponding RIRs and rendered to a 6-channel circular microphone array (M = 6) of radius 2.85 in (7.25 cm). ... To synthesize a single example, we create a 3-second mixture at 44.1 k Hz... |