reproducibilityindex.ai

Soft Threshold Weight Reparameterization for Learnable Sparsity

Authors: Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimentation showing that STR achieves the state-of-the-art accuracy for sparse CNNs (Res Net50 and Mobile Net V1 on Image Net-1K) along with a signiﬁcant reduction in inference FLOPs.
Researcher Affiliation	Collaboration	1University of Washington, USA 2Allen Institute for Artiﬁcial Intelligence, USA 3Microsoft Research, India.
Pseudocode	Yes	Algorithm 1 in the Appendix shows the implementation of STR on 2D convolution along with extensions to global, per-ﬁlter & per-weight sparsity.
Open Source Code	Yes	Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.
Open Datasets	Yes	Image Net-1K (Deng et al., 2009) is a widely used large-scale image classiﬁcation dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. ... Google-12 is a speech recognition dataset that has 12 classes made from the Google Speech Commands dataset (Warden, 2018). HAR-2 is a binarized version of the 6-class Human Activity Recognition dataset (Anguita et al., 2012).
Dataset Splits	Yes	Image Net-1K (Deng et al., 2009) is a widely used large-scale image classiﬁcation dataset with 1K classes. All the CNN experiments presented are on Image Net-1K. Res Net50 (He et al., 2016) and Mobile Net V1 (Howard et al., 2017) are two popular CNN architectures. ... A fully dense Res Net50 trained on Image Net-1K has 77.01% top-1 validation accuracy.
Hardware Specification	Yes	Experiments were run on a machine with 4 NVIDIA Titan X (Pascal) GPUs.
Software Dependencies	No	The paper mentions implementing in PyTorch ('Py Torch STRConv code') and refers to PyTorch (Paszke et al., 2019), but does not specify the version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The experiments for STR use a batch size of 256, cosine learning rate routine and are trained for 100 epochs following the hyperparameter settings in (Wortsman et al., 2019) using SGD + momentum. STR has weight-decay (λ) and sinit hyperparameters to control the overall sparsity in CNNs and can be found in Appendix A.6.