reproducibilityindex.ai

Learning Sound Events from Webly Labeled Data

Authors: Anurag Kumar, Ankit Shah, Alexander Hauptmann, Bhiksha Raj

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our proposed system, Webly Net, two deep neural networks co-teach each other to robustly learn from webly labeled data, leading to around 17% relative improvement over the baseline method.
Researcher Affiliation	Academia	Anurag Kumar , Ankit Shah , Alexander Hauptmann and Bhiksha Raj Language Technologies Institute, School of Computer Science, Carnegie Mellon University argxkr@gmail.com, {aps1, alex, bhiksha}@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 outlines this procedure.
Open Source Code	Yes	Please visit https: //github.com/anuragkr90/webly-labeled-sounds for webly labeled data, codebase and additional analysis.
Open Datasets	Yes	We formed two datasets using the above strategy. The ﬁrst one referred to as Webly-2k uses top 50 retrieved videos for each class and has around 1,900 audio recordings. The second one, Webly-4k, uses the top 100 retrieved videos for each class and contains around 3,800 recordings. Please visit https: //github.com/anuragkr90/webly-labeled-sounds for webly labeled data, codebase and additional analysis.
Dataset Splits	Yes	A subset of recordings from the Unbalanced set of Audio Set is used for validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper mentions "All experiments are done in Py Torch toolkit" but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	N1 is trained on the ﬁrst (X1) audio representations. It is a deep CNN. The layer blocks from B1 to B4 consists of two convolutional layers followed by a max-pooling layer. The number of ﬁlters in both convolutional layers of these blocks are, { B1:64, B2:128, B3:256, B4:256 }. The convolutional ﬁlters are of size 3 3 in all cases, and the convolution operation is done with a stride of 1. Padding of 1 is also applied to inputs of all convolutional layers. The max-pooling in these blocks are done using a window of size 1 2, moving by the same amount. Layer F1 and F2 are again convolutional layers with 1024 ﬁlters of size 1 8 and 1024 ﬁlters of size 1 1 respectively. All convolutional layers from B1 to F2 consists includes batch-normalization [Ioffe and Szegedy2015] and Re LU (max(0, x)) activations. The network N2 (with X2 as inputs) consists of 3 fully connected hidden layers with 2048, 1024 and 1024 neurons respectively. The output layer contains C number of neurons. A dropout of 0.4 is applied after ﬁrst and second hidden layers. Re LU activation is used in all hidden layers and sigmoid in the output layer. The network is trained through Adam optimization [Kingma and Ba2014]. Hyperparameters are tuned using the validation set.