Learning from Between-class Examples for Deep Sound Recognition

Authors: Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that BC learning improves the performance on various sound recognition networks, datasets, and data augmentation schemes, in which BC learning proves to be always beneficial. Furthermore, we construct a new deep sound recognition network (Env Net-v2) and train it with BC learning. As a result, we achieved a performance surpasses the human level.
Researcher Affiliation Academia Yuji Tokozume1, Yoshitaka Ushiku1, Tatsuya Harada1,2 1The University of Tokyo, 2RIKEN {tokozume,ushiku,harada}@mi.t.u-tokyo.ac.jp
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available at https://github.com/mil-tokyo/bc_learning_sound/.
Open Datasets Yes We used ESC-50, ESC-10 (Piczak, 2015b), and Urban Sound8K (Salamon et al., 2014) to train and evaluate the models.
Dataset Splits Yes We evaluated the performance of the methods using a K-fold cross-validation (K = 5 for ESC-50 and ESC-10, and K = 10 for Urban Sound8K), using the original fold settings.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory) were mentioned for running experiments.
Software Dependencies Yes Note that all networks and training codes are our implementation using Chainer v1.24 (Tokui et al., 2015).
Experiment Setup Yes All models were trained with Nesterov s accelerated gradient using a momentum of 0.9, weight decay of 0.0005, and mini-batch size of 64. The only difference in the learning settings between standard and BC learning is the number of training epochs. ... Table 3 shows the detailed learning settings of standard learning. We trained the model by beginning with a learning rate of Initial LR, and then divided the learning rate by 10 at the epoch listed in LR schedule. To improve convergence, we used a 0.1 smaller learning rate for the first Warmup epochs.