LightRNN: Memory and Computation-Efficient Recurrent Neural Networks

Authors: Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Light RNN using the language modeling task on several benchmark datasets. The experimental results show that Light RNN achieves comparable (if not better) accuracy to state-of-the-art language models in terms of perplexity, while reducing the model size by a factor of up to 100 and speeding up the training process by a factor of 2.
Researcher Affiliation Collaboration Xiang Li1 Tao Qin2 Jian Yang1 Tie-Yan Liu2 1Nanjing University of Science and Technology 2Microsoft Research Asia
Pseudocode No The paper describes procedures and algorithms textually and with figures, but does not include structured pseudocode or an algorithm block with explicit labels like 'Algorithm'.
Open Source Code No Fourth, we are cleaning our codes and will release them soon through CNTK [27].
Open Datasets Yes We used all the linguistic corpora from 2013 ACL Workshop Morphological Language Datasets (ACLW) [4] and the One-Billion-Word Benchmark Dataset (Billion W) [5] in our experiments. The detailed information of these public datasets is listed in Table 1.
Dataset Splits Yes For the ACLW datasets, we kept all the training/validation/test sets exactly the same as those in [4, 13] by using their processed data 1. For the Billion W dataset, since the data2 are unprocessed, we processed the data according to the standard procedure as listed in [5]: We discarded all words with count below 3 and padded the sentence boundary markers <S>,< S>. Words outside the vocabulary were mapped to the <UNK> token. Meanwhile, the partition of training/validation/test sets on Billion W was the same with public settings in [5] for fair comparisons.
Hardware Specification Yes All the training processes were conducted on one single GPU K20 with 5GB memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It mentions 'CNTK' in the context of future code release, but not as a versioned dependency for the experiments conducted.
Experiment Setup Yes We trained LSTM-based Light RNN using stochastic gradient descent with truncated backpropagation through time [10, 25]. The initial learning rate was 1.0 and then decreased by a ratio of 2 if the perplexity did not improve on the validation set after a certain number of mini batches. We clipped the gradients of the parameters such that their norms were bounded by 5.0. We further performed dropout with probability 0.5 [28].