LIT: Learned Intermediate Representation Training for Model Compression

Authors: Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that LIT can substantially reduce network size without loss in accuracy on a range of DNN architectures and datasets. For example, LIT can compress Res Net on CIFAR10 by 3.4 outperforming network slimming and Fit Nets. Furthermore, LIT can compress, by depth, Res Ne Xt 5.5 on CIFAR10 (image classification), VDCNN by 1.7 on Amazon Reviews (sentiment analysis), and Star GAN by 1.8 on Celeb A (style transfer, i.e., GANs). We perform an extensive set of experiments:
Researcher Affiliation Academia Animesh Koratana * 1 Daniel Kang * 1 Peter Bailis 1 Matei Zaharia 1 1Stanford University, DAWN Project. Correspondence to: Daniel Kang <ddkang@stanford.edu>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code for LIT is provided at http://github. com/stanford-futuredata/lit-code.
Open Datasets Yes Dataset Task Models CIFAR10 Image classification Res Net, Res Ne Xt CIFAR100 Image classification Res Net, Res Ne Xt Amazon Reviews Sentiment analysis (full, polarity) VDCNN Celeb A Image-to-image translation Star GAN
Dataset Splits Yes To set the hyperparameters for a given structure, we first set τ using a small student model, then α for the fixed τ, then β for the fixed α and τ (all on the validation set).
Hardware Specification No The paper mentions 'hardware support' in general terms but does not provide specific details such as exact GPU models, CPU types, or other hardware specifications used for running experiments.
Software Dependencies No The paper does not list specific versions for software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA, which are necessary for replication.
Experiment Setup Yes We used standard architecture depths, widths, and learning rate schedules (described in the Appendix). We have found that iteratively setting τ, then α, then β to work well in practice. We perform a hyperparameter analysis on the IR penalty, hyperparameter analysis, and mixed precision. (from Section 4.4 intro) Res Net L2 93.20 ± 0.04 (from Table 5 showing specific loss functions being used and compared for IR penalty) LIT works with mixed precision. (from Section 4.4).