Normalization Helps Training of Quantized LSTM
Authors: Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that the normalized quantized LSTMs achieve significantly better results than their unnormalized counterparts. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology, Hong Kong {lhouab,jamesk}@cse.ust.hk 2University of Science and Technology of China, Hefei, China teslazhu@mail.ustc.edu.cn 3Microsoft Research, Beijing, China {feiga, taoqin, tyliu}@microsoft.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | Experiments are performed on three benchmark data sets: (i) Leo Tolstoy s War and Peace; (ii) Penn Treebank Corpus [27]; and (iii) Text8. ... We perform experiments on the sequential version of MNIST classification, which processes one image pixel at a time. We follow the setting in [16, 4, 1], and use both the MNIST and permuted MNIST (p MNIST) [4]. |
| Dataset Splits | Yes | For Penn Treebank and Text8, we use the standard split from [20] and [1] respectively. For War and Peace, we use the split provided in [12]. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam and SGD, and PyTorch, but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Adam is used as the optimizer. The initial learning rate is 0.001, decayed by 0.5 every epoch if the validation loss does not decrease. The batch size is 64. The maximum sequence length is 180. We train for 50 epochs. |