LIT: Learned Intermediate Representation Training for Model Compression
Authors: Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that LIT can substantially reduce network size without loss in accuracy on a range of DNN architectures and datasets. For example, LIT can compress Res Net on CIFAR10 by 3.4 outperforming network slimming and Fit Nets. Furthermore, LIT can compress, by depth, Res Ne Xt 5.5 on CIFAR10 (image classification), VDCNN by 1.7 on Amazon Reviews (sentiment analysis), and Star GAN by 1.8 on Celeb A (style transfer, i.e., GANs). We perform an extensive set of experiments: |
| Researcher Affiliation | Academia | Animesh Koratana * 1 Daniel Kang * 1 Peter Bailis 1 Matei Zaharia 1 1Stanford University, DAWN Project. Correspondence to: Daniel Kang <ddkang@stanford.edu>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for LIT is provided at http://github. com/stanford-futuredata/lit-code. |
| Open Datasets | Yes | Dataset Task Models CIFAR10 Image classification Res Net, Res Ne Xt CIFAR100 Image classification Res Net, Res Ne Xt Amazon Reviews Sentiment analysis (full, polarity) VDCNN Celeb A Image-to-image translation Star GAN |
| Dataset Splits | Yes | To set the hyperparameters for a given structure, we first set τ using a small student model, then α for the fixed τ, then β for the fixed α and τ (all on the validation set). |
| Hardware Specification | No | The paper mentions 'hardware support' in general terms but does not provide specific details such as exact GPU models, CPU types, or other hardware specifications used for running experiments. |
| Software Dependencies | No | The paper does not list specific versions for software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA, which are necessary for replication. |
| Experiment Setup | Yes | We used standard architecture depths, widths, and learning rate schedules (described in the Appendix). We have found that iteratively setting τ, then α, then β to work well in practice. We perform a hyperparameter analysis on the IR penalty, hyperparameter analysis, and mixed precision. (from Section 4.4 intro) Res Net L2 93.20 ± 0.04 (from Table 5 showing specific loss functions being used and compared for IR penalty) LIT works with mixed precision. (from Section 4.4). |