LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification

Authors: Sharath Girish, Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimentation on three standard datasets shows that our framework achieves high levels of structured sparsity in the trained models. Additionally, the introduced priors show gains even in model compression compared to previous state-of-the-art.
Researcher Affiliation Collaboration Sharath Girish1, Kamal Gupta1, Saurabh Singh2 & Abhinav Shrivastava1 1University of Maryland, College Park 2Google Research {sgirish,kampta,abhinav}@cs.umd.edu saurabhsingh@google.com
Pseudocode No The paper does not contain a clearly labeled pseudocode block or algorithm steps.
Open Source Code Yes Code is available at https://github.com/Sharath-girish/Lil Net X.
Open Datasets Yes Datasets. We consider three datasets in our experiments. CIFAR-10 and CIFAR-100 datasets (Krizhevsky et al., 2009) consist of 50000 training and 10000 test color images each of size 32 32. For large scale experiments, we use ILSVRC2012 (Image Net) dataset (Deng et al., 2009). It has 1.2 million images for training, 50000 images for the test and 1000 classes.
Dataset Splits No The paper mentions using 50000 training and 10000 test images for CIFAR-10/100, and 1.2 million training and 50000 test images for ImageNet. While 'validation accuracy' is mentioned, specific details on the validation set split (e.g., size or percentage) are not provided.
Hardware Specification Yes Speedups are measured on a single core of an AMD EPYC 7302 16-Core Processor with a batch size of 16.
Software Dependencies No The paper mentions software like the Adam optimizer, torchac library, and FFCV library, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We use the Adam optimizer (Kingma & Ba, 2014) for updating all parameters of our models. The entropy model parameters are optimized with a learning rate of 10 4 for all our experiments. The remaining parameters are optimized with a learning rate of 0.01 for CIFAR-10 experiments and a learning rate of 0.02 for Res Net-18/50 on Image Net with a cyclic schedule. Our model compression results are reported using the torchac library (Mentzer et al., 2019)... We train Res Net-18/50 for 35 epochs to keep the range of the uncompressed network accuracies similar to other works for a fair comparison. CIFAR-10/100 experiments are trained for 200 epochs. ...with a batch size of 512 for Res Net-18/50 split across 4 GPUs.