HitNet: Hybrid Ternary Recurrent Neural Network
Authors: Peiqi Wang, Xinfeng Xie, Lei Deng, Guoqi Li, Dongsheng Wang, Yuan Xie
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on typical RNN models, such as Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Overall, Hit Net can quantize RNN models into ternary values of {-1, 0, 1} and significantly outperform the state-of-the-art methods towards extremely quantized RNNs. Specifically, we improve the perplexity per word (PPW) of a ternary LSTM on Penn Tree Bank (PTB) corpus from 126 to 110.3 and a ternary GRU from 142 to 113.5. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University 2Beijing National Research Center for Information Science and Technology 3Department of Precision Instrument, Tsinghua University 4Department of Electrical and Computer Engineering, University of California, Santa Barbara |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | No concrete access to source code for the methodology described in this paper was provided. |
| Open Datasets | Yes | All evaluations in this section adopt an LSTM model with one hidden layer of 300 units. The sequence length is set to 35, and it is applied on Penn Tree Bank (PTB) corpus [30]. The accuracy is measured in perplexity per word (PPW), and a lower value in PPW means a better accuracy. ... We first use the Penn Tree Bank (PTB) corpus [30], which contains 10K vocabulary. |
| Dataset Splits | No | The paper mentions 'validation error' but does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) for training, validation, or test sets. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running experiments were provided. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers, were provided. |
| Experiment Setup | Yes | We initialize the learning rate as 20 and decrease it by a factor of 4 at the epoch if the validation error exceeds current best record. The sequence length is set to 35 and the gradient norm is clipped into the range of [-0.25, 0.25]. In addition, we set the maximum epoch to be 40 and set dropout rate to be 0.2. ... We use a batch size of 20 for training... We train both LSTM and GRU with one 512-size hidden layer and set the batch size to 50. ... We train the models with one hidden 1024-size layer and set the batch size to be 50. |