reproducibilityindex.ai

LIT: Learned Intermediate Representation Training for Model Compression

Authors: Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that LIT can substantially reduce network size without loss in accuracy on a range of DNN architectures and datasets. For example, LIT can compress Res Net on CIFAR10 by 3.4 outperforming network slimming and Fit Nets. Furthermore, LIT can compress, by depth, Res Ne Xt 5.5 on CIFAR10 (image classiﬁcation), VDCNN by 1.7 on Amazon Reviews (sentiment analysis), and Star GAN by 1.8 on Celeb A (style transfer, i.e., GANs). We perform an extensive set of experiments:
Researcher Affiliation	Academia	Animesh Koratana * 1 Daniel Kang * 1 Peter Bailis 1 Matei Zaharia 1 1Stanford University, DAWN Project. Correspondence to: Daniel Kang <ddkang@stanford.edu>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for LIT is provided at http://github. com/stanford-futuredata/lit-code.
Open Datasets	Yes	Dataset Task Models CIFAR10 Image classiﬁcation Res Net, Res Ne Xt CIFAR100 Image classiﬁcation Res Net, Res Ne Xt Amazon Reviews Sentiment analysis (full, polarity) VDCNN Celeb A Image-to-image translation Star GAN
Dataset Splits	Yes	To set the hyperparameters for a given structure, we ﬁrst set τ using a small student model, then α for the ﬁxed τ, then β for the ﬁxed α and τ (all on the validation set).
Hardware Specification	No	The paper mentions 'hardware support' in general terms but does not provide specific details such as exact GPU models, CPU types, or other hardware specifications used for running experiments.
Software Dependencies	No	The paper does not list specific versions for software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or CUDA, which are necessary for replication.
Experiment Setup	Yes	We used standard architecture depths, widths, and learning rate schedules (described in the Appendix). We have found that iteratively setting τ, then α, then β to work well in practice. We perform a hyperparameter analysis on the IR penalty, hyperparameter analysis, and mixed precision. (from Section 4.4 intro) Res Net L2 93.20 ± 0.04 (from Table 5 showing specific loss functions being used and compared for IR penalty) LIT works with mixed precision. (from Section 4.4).