reproducibilityindex.ai

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Authors: Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show our method can signiﬁcantly outperform traditional compression methods such as low-rank approximation and pruning.
Researcher Affiliation	Collaboration	Patrick H. Chen UCLA Los Angeles, CA patrickchen@g.ucla.edu Si Si Google Research Mountain View, CA sisidaisy@google.com Yang Li Google Research Mountain View, CA liyang@google.com Ciprian Chelba Google Research Mountain View, CA ciprianchelba@google.com Cho-Jui Hsieh UCLA Los Angeles, CA chohsieh@cs.ucla.edu
Pseudocode	Yes	Algorithm 1: Group Reduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Open Source Code	No	The paper does not provide explicit statements or links indicating that open-source code for the described methodology is available.
Open Datasets	Yes	For LM, we evaluate Group Reduce on two datasets: Penn Treebank Bank (PTB) and One-billion-Word Benchmark (OBW). OBW is introduced by [2], and it contains a vocabulary of 793,471 words with the sentences shufﬂed and the duplicates removed. For NMT, we evaluate our method on the IWSLT 2014 German-to-English translation task [1].
Dataset Splits	No	The paper mentions the use of "validation perplexity" for adjusting the learning rate ("Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller."), indicating a validation set was used, but it does not specify the exact split percentages, sample counts, or explicit methodology for creating the training, validation, and test splits for the datasets used.
Hardware Specification	No	The paper mentions "Google Cloud and Nvidia" in the acknowledgement section, implying the use of their resources, but it does not specify any particular GPU models, CPU types, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using "Py Torch" and "Open NMT" for the NMT task but does not specify the version numbers for these software components.
Experiment Setup	Yes	We train a 2-layer LSTM-based language model on PTB from scratch with two setups: PTB-Small and PTB-Large. The LSTM hidden state sizes are 200 for PTB-Small and 1500 for PTB-Large, so are their embedding sizes. In the experiment, we set the number of clusters to be 5 for PTB and IWSLT datasets, and 20 for the OBW dataset. After approximation, we retrain the rest of parameters by SGD optimizer with initial learning rate 0.1. Whenever, the validation perplexity does not drop down, we decrease the learning rate to an order smaller.