Soft Threshold Weight Reparameterization for Learnable Sparsity

Authors: Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimentation showing that STR achieves the state-of-the-art accuracy for sparse CNNs (Res Net50 and Mobile Net V1 on Image Net-1K) along with a significant reduction in inference FLOPs.
Researcher Affiliation Collaboration 1University of Washington, USA 2Allen Institute for Artificial Intelligence, USA 3Microsoft Research, India.
Pseudocode Yes Algorithm 1 in the Appendix shows the implementation of STR on 2D convolution along with extensions to global, per-filter & per-weight sparsity.
Open Source Code Yes Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.
Open Datasets Yes Image Net-1K (Deng et al., 2009) is a widely used large-scale image classification dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. ... Google-12 is a speech recognition dataset that has 12 classes made from the Google Speech Commands dataset (Warden, 2018). HAR-2 is a binarized version of the 6-class Human Activity Recognition dataset (Anguita et al., 2012).
Dataset Splits Yes Image Net-1K (Deng et al., 2009) is a widely used large-scale image classification dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. Res Net50 (He et al., 2016) and Mobile Net V1 (Howard et al., 2017) are two popular CNN architectures. ... A fully dense Res Net50 trained on Image Net-1K has 77.01% top-1 validation accuracy.
Hardware Specification Yes Experiments were run on a machine with 4 NVIDIA Titan X (Pascal) GPUs.
Software Dependencies No The paper mentions implementing in PyTorch ('Py Torch STRConv code') and refers to PyTorch (Paszke et al., 2019), but does not specify the version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The experiments for STR use a batch size of 256, cosine learning rate routine and are trained for 100 epochs following the hyperparameter settings in (Wortsman et al., 2019) using SGD + momentum. STR has weight-decay (λ) and sinit hyperparameters to control the overall sparsity in CNNs and can be found in Appendix A.6.