WSNet: Compact and Efficient Networks Through Weight Sampling

Authors: Xiaojie Jin, Yingzhen Yang, Ning Xu, Jianchao Yang, Nebojsa Jojic, Jiashi Feng, Shuicheng Yan

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on multiple audio classification datasets verify the effectiveness of WSNet. Combined with weight quantization, the resulted models are up to 180 smaller and theoretically up to 16 faster than the well-established baselines, without noticeable performance drop.
Researcher Affiliation Collaboration 1National University of Singapore, Singapore 2Snap Inc. Research, Los Angeles, USA 3Bytedance Inc., Menlo Park, USA 4Microsoft Research, Redmond, USA 5360 AI Institute, Beijing, China.
Pseudocode No The paper describes mathematical formulations and processes but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at https://github.com/AIROBOTAI/wsnet-v1.
Open Datasets Yes On each test dataset, including ESC-50 (Piczak, 2015a), Urban Sound8K (Salamon et al., 2014), DCASE (Stowell et al., 2015) and Music Det200K (a self-collected dataset, as detailed in Section 4), WSNet significantly reduces the model size of the baseline by 100 with comparable or even higher classification accuracy.
Dataset Splits Yes All results of WSNet are obtained by 10-folder validation. [...] For each dataset, we hold out 20% of training samples to form a validation set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No WSNet is implemented and trained from scratch in Tensorflow (Abadi et al., 2016). While TensorFlow is mentioned, no specific version number for TensorFlow or any other software libraries is provided, which is necessary for reproducibility.
Experiment Setup Yes Following (Aytar et al., 2016), the Adam (Kingma & Ba, 2014) optimizer, a fixed learning rate of 0.001, and momentum term of 0.9 and batch size of 64 are used throughout experiments. We initialized all the weights to zero mean gaussian noise with a standard deviation of 0.01. In the network used on Music Det200K, the dropout ratio for the dropout layers (Srivastava et al., 2014) after each fully connected layer is set to be 0.8. The overall training takes 100,000 iterations.