Words or Characters? Fine-grained Gating for Reading Comprehension

Authors: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS We first present experimental results on the Twitter dataset where we can rule out the effects of different choices of network architectures, to demonstrate the effectiveness of our word-character fine-grained gating approach. Later we show experiments on more challenging datasets on reading comprehension to further show that our approach can be used to improve the performance on high-level NLP tasks as well.
Researcher Affiliation Academia Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov School of Computer Science Carnegie Mellon University {zhiliny,wcohen,rsalakhu}@cs.cmu.edu
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/kimiyoung/fg-gating
Open Datasets Yes The Twitter dataset consists of English tweets with at least one hashtag from Twitter... The Children s Book Test (CBT) dataset is built from children s books (Hill et al., 2016). ... The Stanford Question Answering Dataset (SQu AD) is a reading comprehension dataset collected recently (Rajpurkar et al., 2016).
Dataset Splits Yes The Twitter dataset contains 2 million tweets for training, 10K for validation and 50K for testing... The Children s Book Test dataset has 669,343 questions for training, 8,000 for validation and 10,000 for testing... The Stanford Question Answering Dataset (SQu AD) is partitioned into a training set (80%, 87,636 question-answer pairs), a development set (10%, 10,600 question-answer pairs)...
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For the fine-grained gating approach, we use the same hyper-parameters as in Dhingra et al. (2016a) except that we use a character-level GRU with 100 units to be of the same size as the word lookup table. The word embeddings are updated during training.