Soft Threshold Weight Reparameterization for Learnable Sparsity
Authors: Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimentation showing that STR achieves the state-of-the-art accuracy for sparse CNNs (Res Net50 and Mobile Net V1 on Image Net-1K) along with a significant reduction in inference FLOPs. |
| Researcher Affiliation | Collaboration | 1University of Washington, USA 2Allen Institute for Artificial Intelligence, USA 3Microsoft Research, India. |
| Pseudocode | Yes | Algorithm 1 in the Appendix shows the implementation of STR on 2D convolution along with extensions to global, per-filter & per-weight sparsity. |
| Open Source Code | Yes | Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR. |
| Open Datasets | Yes | Image Net-1K (Deng et al., 2009) is a widely used large-scale image classification dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. ... Google-12 is a speech recognition dataset that has 12 classes made from the Google Speech Commands dataset (Warden, 2018). HAR-2 is a binarized version of the 6-class Human Activity Recognition dataset (Anguita et al., 2012). |
| Dataset Splits | Yes | Image Net-1K (Deng et al., 2009) is a widely used large-scale image classification dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. Res Net50 (He et al., 2016) and Mobile Net V1 (Howard et al., 2017) are two popular CNN architectures. ... A fully dense Res Net50 trained on Image Net-1K has 77.01% top-1 validation accuracy. |
| Hardware Specification | Yes | Experiments were run on a machine with 4 NVIDIA Titan X (Pascal) GPUs. |
| Software Dependencies | No | The paper mentions implementing in PyTorch ('Py Torch STRConv code') and refers to PyTorch (Paszke et al., 2019), but does not specify the version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The experiments for STR use a batch size of 256, cosine learning rate routine and are trained for 100 epochs following the hyperparameter settings in (Wortsman et al., 2019) using SGD + momentum. STR has weight-decay (λ) and sinit hyperparameters to control the overall sparsity in CNNs and can be found in Appendix A.6. |