reproducibilityindex.ai

Alternating Multi-bit Quantization for Recurrent Neural Networks

Authors: Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang, Hongbin Zha

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the quantization for two well-known RNNs, i.e., long short term memory (LSTM) and gated recurrent unit (GRU), on the language models. Compared with the full-precision counter part, by 2-bit quantization we can achieve 16 memory saving and 6 real inference acceleration on CPUs, with only a reasonable loss in the accuracy. By 3-bit quantization, we can achieve almost no loss in the accuracy or even surpass the original model, with 10.5 memory saving and 3 real inference acceleration. Both results beat the exiting quantization works with large margins.
Researcher Affiliation	Collaboration	Chen Xu1, , Jianqiang Yao2, Zhouchen Lin1,3, , Wenwu Ou2, Yuanbin Cao4, Zhirong Wang2, Hongbin Zha1,3 1 Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, China 2 Search Algorithm Team, Alibaba Group, China 3 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China 4 AI-LAB, Alibaba Group, China xuen@pku.edu.cn,tianduo@taobao.com,zlin@pku.edu.cn,santong.oww@taobao.com lingzun.cyb@alibaba-inc.com, qingfeng@taobao.com,zha@cis.pku.edu.cn
Pseudocode	Yes	Algorithm 1: Binary Search Tree (BST) to determine to optimal code
Open Source Code	No	The paper does not provide any explicit statements about open-source code availability or links to a code repository for the methodology described.
Open Datasets	Yes	We ﬁrst conduct experiments on the Peen Tree Bank (PTB) corpus (Marcus et al., 1993), using the standard preprocessed splits with a 10K size vocabulary (Mikolov, 2012). The PTB dataset contains 929K training tokens, 73K validation tokens, and 82K test tokens.
Dataset Splits	Yes	The PTB dataset contains 929K training tokens, 73K validation tokens, and 82K test tokens.
Hardware Specification	Yes	We test it on Intel Xeon E5-2682 v4 @ 2.50 GHz CPU.
Software Dependencies	No	The paper mentions using specific CPU instructions like _mm256_xor_ps and _popcnt64, and the Intel Math Kernel Library (MKL), but it does not provide specific version numbers for these or other software dependencies like programming languages, frameworks, or libraries that would be necessary for full reproducibility.
Experiment Setup	Yes	The initial learning rate is set to 20. Every epoch we evaluate on the validation dataset and record the best value. When the validation error exceeds the best record, we decrease learning rate by a factor of 1.2. Training is terminated once the learning rate less than 0.001 or reaching the maximum epochs, i.e., 80. The gradient norm is clipped in the range [ 0.25, 0.25]. We unroll the network for 30 time steps and regularize it with the standard dropout (probability of dropping out units equals to 0.5) (Zaremba et al., 2014). For simplicity of notation, we denote the methods using uniform, balanced, greedy, reﬁned greedy, and our alternating quantization as Uniform, Balanced, Greedy, Reﬁned, and Alternating, respectively. We train with the batch size 20.