SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Authors: Yuhang He, Niki Trigoni, Andrew Markham

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the public DCASE dataset show the advantage of Sound Det on both segment-based and our newly proposed event-based evaluation system.
Researcher Affiliation Academia 1Department of Computer Science, University of Oxford, Oxford, United Kingdom. Email: firstname.lastname@cs.ox.ac.uk.
Pseudocode No The paper provides an architectural illustration in Table 4 and describes components, but does not include structured pseudocode or an algorithm block labeled as such.
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate Sound Det on TAU-NIGENS DCASE sound event detection and localization (SELD)(Adavanne et al., 2018) dataset.
Dataset Splits Yes We follow the official splits and use 1-6 folds for train and the remaining 7-8 folds (200 1-min recordings) for test.
Hardware Specification Yes We further report the average inference time of different methods to process a one-minute long audio in Table. 3, showing that it is almost twice as fast as EIN, and comparable to SELDNet. Inference time on Intel(R) Core(TM) i9-7920X CPU.
Software Dependencies No The paper describes optimizers (SGD), learning rates, and network architecture, but does not specify software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup Yes For the backbone training, we use SGD optimizer with an initial learning rate 0.5, the learning rate decays every 30 epochs with decay rate 0.7. ... H and W indicate dense proposal map height and width, respectively, in our experiment H = 60 and W = 60