Words or Characters? Fine-grained Gating for Reading Comprehension
Authors: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We first present experimental results on the Twitter dataset where we can rule out the effects of different choices of network architectures, to demonstrate the effectiveness of our word-character fine-grained gating approach. Later we show experiments on more challenging datasets on reading comprehension to further show that our approach can be used to improve the performance on high-level NLP tasks as well. |
| Researcher Affiliation | Academia | Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov School of Computer Science Carnegie Mellon University {zhiliny,wcohen,rsalakhu}@cs.cmu.edu |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/kimiyoung/fg-gating |
| Open Datasets | Yes | The Twitter dataset consists of English tweets with at least one hashtag from Twitter... The Children s Book Test (CBT) dataset is built from children s books (Hill et al., 2016). ... The Stanford Question Answering Dataset (SQu AD) is a reading comprehension dataset collected recently (Rajpurkar et al., 2016). |
| Dataset Splits | Yes | The Twitter dataset contains 2 million tweets for training, 10K for validation and 50K for testing... The Children s Book Test dataset has 669,343 questions for training, 8,000 for validation and 10,000 for testing... The Stanford Question Answering Dataset (SQu AD) is partitioned into a training set (80%, 87,636 question-answer pairs), a development set (10%, 10,600 question-answer pairs)... |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For the fine-grained gating approach, we use the same hyper-parameters as in Dhingra et al. (2016a) except that we use a character-level GRU with 100 units to be of the same size as the word lookup table. The word embeddings are updated during training. |