DiffSED: Sound Event Detection with Denoising Diffusion
Authors: Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Urban-SED and EPIC-Sounds datasets demonstrate that our model significantly outperforms existing alternatives, with 40+% faster convergence in training. |
| Researcher Affiliation | Academia | 1University of Surrey, UK 2Imperial College London, UK |
| Pseudocode | Yes | Algorithm 1: Training; Algorithm 2: Noise corruption |
| Open Source Code | Yes | Code: https://github.com/Surrey-UPLab/Diff SED. |
| Open Datasets | Yes | Extensive experiments on the Urban-SED and EPIC-Sounds datasets... URBAN-SED (Salamon, Jacoby, and Bello 2014)... EPIC-Sounds (Huh et al. 2023). |
| Dataset Splits | Yes | Figure 3: Convergence rates for SEDT and Diff SED on the URBAN-SED dataset. The dotted lines represent the training epoch when the best-performing checkpoint (the one with the best audio-tagging F1 score on the validation set) arrived... For the EPIC-Sounds dataset, we report the top-1 and top-5 accuracy, as well as mean average precision (m AP), mean area under ROC curve (m AUC), and mean per class accuracy (m CA) on the validation split, following the protocol of (Huh et al. 2023). |
| Hardware Specification | Yes | All models are trained with 2 NVIDIAA5500 GPUs. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Res Net-50' and implies the use of a deep learning framework, but it does not specify versions for any software components, such as Python, PyTorch/TensorFlow, or other libraries. |
| Experiment Setup | Yes | Our model is trained for 400 epochs, while re-initializing the weights from the best checkpoint for every 100 epochs, using Adam optimizer with an initial learning rate of 10 4 with a decay schedule of 10 2. The batch size is set to 64 for URBAN-SED and 128 for EPIC-Sounds. |