DiffSED: Sound Event Detection with Denoising Diffusion

Authors: Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the Urban-SED and EPIC-Sounds datasets demonstrate that our model significantly outperforms existing alternatives, with 40+% faster convergence in training.
Researcher Affiliation Academia 1University of Surrey, UK 2Imperial College London, UK
Pseudocode Yes Algorithm 1: Training; Algorithm 2: Noise corruption
Open Source Code Yes Code: https://github.com/Surrey-UPLab/Diff SED.
Open Datasets Yes Extensive experiments on the Urban-SED and EPIC-Sounds datasets... URBAN-SED (Salamon, Jacoby, and Bello 2014)... EPIC-Sounds (Huh et al. 2023).
Dataset Splits Yes Figure 3: Convergence rates for SEDT and Diff SED on the URBAN-SED dataset. The dotted lines represent the training epoch when the best-performing checkpoint (the one with the best audio-tagging F1 score on the validation set) arrived... For the EPIC-Sounds dataset, we report the top-1 and top-5 accuracy, as well as mean average precision (m AP), mean area under ROC curve (m AUC), and mean per class accuracy (m CA) on the validation split, following the protocol of (Huh et al. 2023).
Hardware Specification Yes All models are trained with 2 NVIDIAA5500 GPUs.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Res Net-50' and implies the use of a deep learning framework, but it does not specify versions for any software components, such as Python, PyTorch/TensorFlow, or other libraries.
Experiment Setup Yes Our model is trained for 400 epochs, while re-initializing the weights from the best checkpoint for every 100 epochs, using Adam optimizer with an initial learning rate of 10 4 with a decay schedule of 10 2. The batch size is set to 64 for URBAN-SED and 128 for EPIC-Sounds.