Normalization Helps Training of Quantized LSTM

Authors: Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, Tie-Yan Liu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that the normalized quantized LSTMs achieve significantly better results than their unnormalized counterparts.
Researcher Affiliation Collaboration 1Hong Kong University of Science and Technology, Hong Kong {lhouab,jamesk}@cse.ust.hk 2University of Science and Technology of China, Hefei, China teslazhu@mail.ustc.edu.cn 3Microsoft Research, Beijing, China {feiga, taoqin, tyliu}@microsoft.com
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any statement or link indicating the release of open-source code for the described methodology.
Open Datasets Yes Experiments are performed on three benchmark data sets: (i) Leo Tolstoy s War and Peace; (ii) Penn Treebank Corpus [27]; and (iii) Text8. ... We perform experiments on the sequential version of MNIST classification, which processes one image pixel at a time. We follow the setting in [16, 4, 1], and use both the MNIST and permuted MNIST (p MNIST) [4].
Dataset Splits Yes For Penn Treebank and Text8, we use the standard split from [20] and [1] respectively. For War and Peace, we use the split provided in [12].
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions optimizers like Adam and SGD, and PyTorch, but does not specify version numbers for any software dependencies.
Experiment Setup Yes Adam is used as the optimizer. The initial learning rate is 0.001, decayed by 0.5 every epoch if the validation loss does not decrease. The batch size is 64. The maximum sequence length is 180. We train for 50 epochs.