reproducibilityindex.ai

Language model compression with weighted low-rank factorization

Authors: Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, Hongxia Jin

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform analysis with the transformer-based language models, showing our weighted SVD largely alleviates the mismatched optimization objectives and can maintain model performance with a higher compression rate. Our method can directly compress a task-speciﬁc model while achieving better performance than other compact model strategies requiring expensive model pre-training. Moreover, the evaluation of compressing an already compact model shows our method can further reduce 9% to 30% parameters with an insigniﬁcant impact on task accuracy.Table 1: Results of Co NLL and GLUE benchmark.
Researcher Affiliation	Collaboration	Yen-Chang Hsu 1, Ting Hua 1, Sung-En Chang2, Qian Lou1, Yilin Shen1, and Hongxia Jin1 1Samsung Research America , 2Northeastern University
Pseudocode	No	The paper describes the mathematical formulation of SVD and FWSVD but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the 'popular Hugging Face Transformers library (Wolf et al., 2020)' but does not provide a statement or link for the open-source code of their own methodology.
Open Datasets	Yes	We evaluate the methods of all three paths in Figure 4 on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019) and a token classiﬁcation task. We include 2 single sentence tasks: Co LA (Warstadt et al., 2018) measured in Matthew s correlation, SST2 (Socher et al., 2013) measured in classiﬁcation accuracy; 3 sentence similarity tasks: MRPC (Dolan et al., 2005) measured in F-1 score, STS-B (Cer et al., 2017) measured in Pearson-Spearman correlation, QQP (Chen et al., 2018b) measured in F-1 score; and 3 natural language inference tasks: MNLI (Williams et al., 2018) measured in classiﬁcation accuracy with the average of the matched and mismatched subsets, QNLI (Rajpurkar et al., 2016) measured in accuracy. The token classiﬁcation task we used is the named entity recognition (NER) on the Co NLL-2003 dataset (Sang & De Meulder, 2003).
Dataset Splits	Yes	For the SOTA models on path-1 (Mini LMv2 and Distil BERT), we use the pre-trained generic compact models (Sg) provided by the original authors as the starting point, then directly ﬁne-tune them with 3 epochs on the target task training data. The ﬁne-tuning is optimized by Adam with learning rate of 2 10 5 and batch size of 32 on one GPU. we directly report the results on the dev set of all the datasets, making the numbers convenient to compare and verify.
Hardware Specification	No	The paper mentions '384 NVIDIA V100 GPU hours' in the context of other works' pre-training costs, and 'one GPU' for fine-tuning in their own setup. However, it does not specify the exact GPU model or any other hardware specifications used for their experiments.
Software Dependencies	No	The paper states: 'Lastly, our implementation and experiments are built on top of the popular Hugging Face Transformers library (Wolf et al., 2020).' However, it does not provide specific version numbers for the library or any other software dependencies such as Python or PyTorch.
Experiment Setup	Yes	The ﬁne-tuning is optimized by Adam with learning rate of 2 10 5 and batch size of 32 on one GPU.