WSNet: Compact and Efficient Networks Through Weight Sampling
Authors: Xiaojie Jin, Yingzhen Yang, Ning Xu, Jianchao Yang, Nebojsa Jojic, Jiashi Feng, Shuicheng Yan
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple audio classification datasets verify the effectiveness of WSNet. Combined with weight quantization, the resulted models are up to 180 smaller and theoretically up to 16 faster than the well-established baselines, without noticeable performance drop. |
| Researcher Affiliation | Collaboration | 1National University of Singapore, Singapore 2Snap Inc. Research, Los Angeles, USA 3Bytedance Inc., Menlo Park, USA 4Microsoft Research, Redmond, USA 5360 AI Institute, Beijing, China. |
| Pseudocode | No | The paper describes mathematical formulations and processes but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github.com/AIROBOTAI/wsnet-v1. |
| Open Datasets | Yes | On each test dataset, including ESC-50 (Piczak, 2015a), Urban Sound8K (Salamon et al., 2014), DCASE (Stowell et al., 2015) and Music Det200K (a self-collected dataset, as detailed in Section 4), WSNet significantly reduces the model size of the baseline by 100 with comparable or even higher classification accuracy. |
| Dataset Splits | Yes | All results of WSNet are obtained by 10-folder validation. [...] For each dataset, we hold out 20% of training samples to form a validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | WSNet is implemented and trained from scratch in Tensorflow (Abadi et al., 2016). While TensorFlow is mentioned, no specific version number for TensorFlow or any other software libraries is provided, which is necessary for reproducibility. |
| Experiment Setup | Yes | Following (Aytar et al., 2016), the Adam (Kingma & Ba, 2014) optimizer, a fixed learning rate of 0.001, and momentum term of 0.9 and batch size of 64 are used throughout experiments. We initialized all the weights to zero mean gaussian noise with a standard deviation of 0.01. In the network used on Music Det200K, the dropout ratio for the dropout layers (Srivastava et al., 2014) after each fully connected layer is set to be 0.8. The overall training takes 100,000 iterations. |