Learning to Recognize Transient Sound Events using Attentional Supervision
Authors: Szu-Yu Chou, Jyh-Shing Jang, Yi-Hsuan Yang
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that M&mnet works remarkably well for recognizing sound events, establishing a new state-of-the-art for DCASE17 and Audio Set data sets. |
| Researcher Affiliation | Academia | 1Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan 2Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | For reproducibility, we will share the python source code and trained models online through a github repo.4 https://github.com/fearofchou/mmnet |
| Open Datasets | Yes | The first set of experiments uses DCASE17, a subset of Audio Set that was used in DCASE2017 Challenge Task 4 [Mesaros et al., 2017]. ...The second set of experiments uses Audio Set, containing over 2M audio clips with 527 possible sound events. Google provides a balanced training set with at least 59 examples per class, called Audio Set-22K, and a balanced test set with again at least 59 examples per class, called Audio Set-20K. |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly describe a separate validation split with specific percentages, counts, or a defined method for creating it. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions using the 'librosa library' but does not provide a specific version number. No other software dependencies with version numbers are listed. |
| Experiment Setup | Yes | For optimization, we used SGD with a mini-batch size of 64 and initial learning rate 0.1. We divided the learning rate by 10 every 30 epochs and set the maximal number of epochs to 100. To avoid overfitting, we set the weight decay to 1e-4. |