LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Authors: Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Light RNN using the language modeling task on several benchmark datasets. The experimental results show that Light RNN achieves comparable (if not better) accuracy to state-of-the-art language models in terms of perplexity, while reducing the model size by a factor of up to 100 and speeding up the training process by a factor of 2. |
| Researcher Affiliation | Collaboration | Xiang Li1 Tao Qin2 Jian Yang1 Tie-Yan Liu2 1Nanjing University of Science and Technology 2Microsoft Research Asia |
| Pseudocode | No | The paper describes procedures and algorithms textually and with figures, but does not include structured pseudocode or an algorithm block with explicit labels like 'Algorithm'. |
| Open Source Code | No | Fourth, we are cleaning our codes and will release them soon through CNTK [27]. |
| Open Datasets | Yes | We used all the linguistic corpora from 2013 ACL Workshop Morphological Language Datasets (ACLW) [4] and the One-Billion-Word Benchmark Dataset (Billion W) [5] in our experiments. The detailed information of these public datasets is listed in Table 1. |
| Dataset Splits | Yes | For the ACLW datasets, we kept all the training/validation/test sets exactly the same as those in [4, 13] by using their processed data 1. For the Billion W dataset, since the data2 are unprocessed, we processed the data according to the standard procedure as listed in [5]: We discarded all words with count below 3 and padded the sentence boundary markers <S>,< S>. Words outside the vocabulary were mapped to the <UNK> token. Meanwhile, the partition of training/validation/test sets on Billion W was the same with public settings in [5] for fair comparisons. |
| Hardware Specification | Yes | All the training processes were conducted on one single GPU K20 with 5GB memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It mentions 'CNTK' in the context of future code release, but not as a versioned dependency for the experiments conducted. |
| Experiment Setup | Yes | We trained LSTM-based Light RNN using stochastic gradient descent with truncated backpropagation through time [10, 25]. The initial learning rate was 1.0 and then decreased by a ratio of 2 if the perplexity did not improve on the validation set after a certain number of mini batches. We clipped the gradients of the parameters such that their norms were bounded by 5.0. We further performed dropout with probability 0.5 [28]. |