SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network

Authors: Yuhang He, Zhuangzhuang Dai, Niki Trigoni, Long Chen, Andrew Markham

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test Dy Dec Net on various datasets to show its superiority. We run experiment on large amounts of sound datasets, including commonly heard bioacoustic, indoor and outdoor, real-world and synthetic sound. Comprehensive experimental results show the superiority of our proposed framework in counting under different challenging acoustic scenarios.
Researcher Affiliation Collaboration Yuhang He1, Zhuangzhuang Dai2, Niki Trigoni1, Long Chen3,4*, Andrew Markham 1 1Department of Computer Science, University of Oxford, UK. yuhang.he@cs.ox.ac.uk 2Department of Applied AI and Robotics, Aston University. UK 3Institute of Automation, Chinese Academy of Sciences, China. 4 WAYTOUS Ltd., China.
Pseudocode No The paper describes the architecture and processes, but it does not include any formal pseudocode blocks or figures explicitly labeled as
Open Source Code No The paper does not include any statements about open-sourcing the code or provide links to a code repository.
Open Datasets Yes We run experiments on five main datasets. Audio Set (Gemmeke et al. 2017) is a large temporally-strong labelled dataset... North East US (Chronister et al. 2021) dataset... We use Open Mic2018 dataset (J. Humphrey, Durand, and Mc Fee 2018) to count musical instruments.
Dataset Splits Yes Specifically, we train model on the train dataset which has 103,463 audio clips and 934,821 labels, and test the model on the evaluation which has 16,996 audio clips and 139,538 labels.
Hardware Specification Yes We train the models with Pytorch (Paszke et al. 2019) on TITAN RTX GPU.
Software Dependencies No The paper mentions using "Pytorch (Paszke et al. 2019)" but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes We adopt Adam optimizer (Kingma and Ba 2015) with an initial learning rate 0.001 which decays every 20 epochs with a decaying rate 0.5. Overall, we train 60 epochs. For the energy gain normalization we initialize them as α = 0.96, δ = 2., γ = 0.5, σ = 0.5. The batchsize is 128.