Reading selectively via Binary Input Gated Recurrent Unit
Authors: Zhe Li, Peisong Wang, Hanqing Lu, Jian Cheng
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct document classification task and language modeling task on 6 different datasets to verify our model and our model achieves better performance. |
| Researcher Affiliation | Academia | Zhe Li1 , Peisong Wang1,2 , Hanqing Lu1 and Jian Cheng1,2 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2Center for Excellence in Brain Science and Intelligence Technology |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code. |
| Open Datasets | Yes | We conduct document classification task and language modeling task on 6 different datasets... including Stanford Sentiment Treebank (SST), IMDb, AGNews and DBPedia. ... Penn Treebank (PTB) and Wiki Text-2 dataset. |
| Dataset Splits | Yes | Table 1: Statistics of the classification datasets that BIGRU is evaluated on, where SST refers to Stanford Sentiment Treebank. SST Sentiment Analysis Pos/Neg 6,920 / 872 / 1821 ... IMDb Sentiment Analysis Pos/Neg 21,143 / 3,857 / 25,000 ... AGNews News Classification 4 categories 101,851 / 18,149 / 7,600 ... DBPedia Topic Classification 14 categories 475,999 / 84,000 / 69,999 |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For both GRU and BIGRU, we use a stacked three-layer RNN. Each word is embedded into a 100-dimensional vector. All models are trained with Adam, with the initial learning rate of 0.0001. We set gradient clip to 2.0. We use batch size of 32 for SST and 128 for the remaining. For both models, we set an early stop if the validation accuracy does not increase for 1000 global steps. ... We use an initial learning rate of 10 for all experiments and carry out gradient clipping with maximum norm 0.25. We use a batch size of 80 for Wiki Text-2 and 40 for PTB. We train 1000 epochs for the small model and 2000 epochs for the large model. |