SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
Authors: Yuhang He, Zhuangzhuang Dai, Niki Trigoni, Long Chen, Andrew Markham
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test Dy Dec Net on various datasets to show its superiority. We run experiment on large amounts of sound datasets, including commonly heard bioacoustic, indoor and outdoor, real-world and synthetic sound. Comprehensive experimental results show the superiority of our proposed framework in counting under different challenging acoustic scenarios. |
| Researcher Affiliation | Collaboration | Yuhang He1, Zhuangzhuang Dai2, Niki Trigoni1, Long Chen3,4*, Andrew Markham 1 1Department of Computer Science, University of Oxford, UK. yuhang.he@cs.ox.ac.uk 2Department of Applied AI and Robotics, Aston University. UK 3Institute of Automation, Chinese Academy of Sciences, China. 4 WAYTOUS Ltd., China. |
| Pseudocode | No | The paper describes the architecture and processes, but it does not include any formal pseudocode blocks or figures explicitly labeled as |
| Open Source Code | No | The paper does not include any statements about open-sourcing the code or provide links to a code repository. |
| Open Datasets | Yes | We run experiments on five main datasets. Audio Set (Gemmeke et al. 2017) is a large temporally-strong labelled dataset... North East US (Chronister et al. 2021) dataset... We use Open Mic2018 dataset (J. Humphrey, Durand, and Mc Fee 2018) to count musical instruments. |
| Dataset Splits | Yes | Specifically, we train model on the train dataset which has 103,463 audio clips and 934,821 labels, and test the model on the evaluation which has 16,996 audio clips and 139,538 labels. |
| Hardware Specification | Yes | We train the models with Pytorch (Paszke et al. 2019) on TITAN RTX GPU. |
| Software Dependencies | No | The paper mentions using "Pytorch (Paszke et al. 2019)" but does not specify the version number of PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We adopt Adam optimizer (Kingma and Ba 2015) with an initial learning rate 0.001 which decays every 20 epochs with a decaying rate 0.5. Overall, we train 60 epochs. For the energy gain normalization we initialize them as α = 0.96, δ = 2., γ = 0.5, σ = 0.5. The batchsize is 128. |