Learning to Recognize Transient Sound Events using Attentional Supervision

Authors: Szu-Yu Chou, Jyh-Shing Jang, Yi-Hsuan Yang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that M&mnet works remarkably well for recognizing sound events, establishing a new state-of-the-art for DCASE17 and Audio Set data sets.
Researcher Affiliation Academia 1Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan 2Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes For reproducibility, we will share the python source code and trained models online through a github repo.4 https://github.com/fearofchou/mmnet
Open Datasets Yes The first set of experiments uses DCASE17, a subset of Audio Set that was used in DCASE2017 Challenge Task 4 [Mesaros et al., 2017]. ...The second set of experiments uses Audio Set, containing over 2M audio clips with 527 possible sound events. Google provides a balanced training set with at least 59 examples per class, called Audio Set-22K, and a balanced test set with again at least 59 examples per class, called Audio Set-20K.
Dataset Splits No The paper mentions training and test sets but does not explicitly describe a separate validation split with specific percentages, counts, or a defined method for creating it.
Hardware Specification No The paper does not specify any hardware details such as CPU, GPU models, or memory.
Software Dependencies No The paper mentions using the 'librosa library' but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes For optimization, we used SGD with a mini-batch size of 64 and initial learning rate 0.1. We divided the learning rate by 10 every 30 epochs and set the maximal number of epochs to 100. To avoid overfitting, we set the weight decay to 1e-4.