Reading selectively via Binary Input Gated Recurrent Unit

Authors: Zhe Li, Peisong Wang, Hanqing Lu, Jian Cheng

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct document classification task and language modeling task on 6 different datasets to verify our model and our model achieves better performance.
Researcher Affiliation Academia Zhe Li1 , Peisong Wang1,2 , Hanqing Lu1 and Jian Cheng1,2 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2Center for Excellence in Brain Science and Intelligence Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code.
Open Datasets Yes We conduct document classification task and language modeling task on 6 different datasets... including Stanford Sentiment Treebank (SST), IMDb, AGNews and DBPedia. ... Penn Treebank (PTB) and Wiki Text-2 dataset.
Dataset Splits Yes Table 1: Statistics of the classification datasets that BIGRU is evaluated on, where SST refers to Stanford Sentiment Treebank. SST Sentiment Analysis Pos/Neg 6,920 / 872 / 1821 ... IMDb Sentiment Analysis Pos/Neg 21,143 / 3,857 / 25,000 ... AGNews News Classification 4 categories 101,851 / 18,149 / 7,600 ... DBPedia Topic Classification 14 categories 475,999 / 84,000 / 69,999
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes For both GRU and BIGRU, we use a stacked three-layer RNN. Each word is embedded into a 100-dimensional vector. All models are trained with Adam, with the initial learning rate of 0.0001. We set gradient clip to 2.0. We use batch size of 32 for SST and 128 for the remaining. For both models, we set an early stop if the validation accuracy does not increase for 1000 global steps. ... We use an initial learning rate of 10 for all experiments and carry out gradient clipping with maximum norm 0.25. We use a batch size of 80 for Wiki Text-2 and 40 for PTB. We train 1000 epochs for the small model and 2000 epochs for the large model.