GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Authors: Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. |
| Researcher Affiliation | Collaboration | Patrick H. Chen UCLA Los Angeles, CA patrickchen@g.ucla.edu Si Si Google Research Mountain View, CA sisidaisy@google.com Yang Li Google Research Mountain View, CA liyang@google.com Ciprian Chelba Google Research Mountain View, CA ciprianchelba@google.com Cho-Jui Hsieh UCLA Los Angeles, CA chohsieh@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1: Group Reduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that open-source code for the described methodology is available. |
| Open Datasets | Yes | For LM, we evaluate Group Reduce on two datasets: Penn Treebank Bank (PTB) and One-billion-Word Benchmark (OBW). OBW is introduced by [2], and it contains a vocabulary of 793,471 words with the sentences shuffled and the duplicates removed. For NMT, we evaluate our method on the IWSLT 2014 German-to-English translation task [1]. |
| Dataset Splits | No | The paper mentions the use of "validation perplexity" for adjusting the learning rate ("Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller."), indicating a validation set was used, but it does not specify the exact split percentages, sample counts, or explicit methodology for creating the training, validation, and test splits for the datasets used. |
| Hardware Specification | No | The paper mentions "Google Cloud and Nvidia" in the acknowledgement section, implying the use of their resources, but it does not specify any particular GPU models, CPU types, or other detailed hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Py Torch" and "Open NMT" for the NMT task but does not specify the version numbers for these software components. |
| Experiment Setup | Yes | We train a 2-layer LSTM-based language model on PTB from scratch with two setups: PTB-Small and PTB-Large. The LSTM hidden state sizes are 200 for PTB-Small and 1500 for PTB-Large, so are their embedding sizes. In the experiment, we set the number of clusters to be 5 for PTB and IWSLT datasets, and 20 for the OBW dataset. After approximation, we retrain the rest of parameters by SGD optimizer with initial learning rate 0.1. Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller. |