GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Authors: Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning.
Researcher Affiliation Collaboration Patrick H. Chen UCLA Los Angeles, CA patrickchen@g.ucla.edu Si Si Google Research Mountain View, CA sisidaisy@google.com Yang Li Google Research Mountain View, CA liyang@google.com Ciprian Chelba Google Research Mountain View, CA ciprianchelba@google.com Cho-Jui Hsieh UCLA Los Angeles, CA chohsieh@cs.ucla.edu
Pseudocode Yes Algorithm 1: Group Reduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Open Source Code No The paper does not provide explicit statements or links indicating that open-source code for the described methodology is available.
Open Datasets Yes For LM, we evaluate Group Reduce on two datasets: Penn Treebank Bank (PTB) and One-billion-Word Benchmark (OBW). OBW is introduced by [2], and it contains a vocabulary of 793,471 words with the sentences shuffled and the duplicates removed. For NMT, we evaluate our method on the IWSLT 2014 German-to-English translation task [1].
Dataset Splits No The paper mentions the use of "validation perplexity" for adjusting the learning rate ("Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller."), indicating a validation set was used, but it does not specify the exact split percentages, sample counts, or explicit methodology for creating the training, validation, and test splits for the datasets used.
Hardware Specification No The paper mentions "Google Cloud and Nvidia" in the acknowledgement section, implying the use of their resources, but it does not specify any particular GPU models, CPU types, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using "Py Torch" and "Open NMT" for the NMT task but does not specify the version numbers for these software components.
Experiment Setup Yes We train a 2-layer LSTM-based language model on PTB from scratch with two setups: PTB-Small and PTB-Large. The LSTM hidden state sizes are 200 for PTB-Small and 1500 for PTB-Large, so are their embedding sizes. In the experiment, we set the number of clusters to be 5 for PTB and IWSLT datasets, and 20 for the OBW dataset. After approximation, we retrain the rest of parameters by SGD optimizer with initial learning rate 0.1. Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller.