Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Words or Characters? Fine-grained Gating for Reading Comprehension
Authors: Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS We first present experimental results on the Twitter dataset where we can rule out the effects of different choices of network architectures, to demonstrate the effectiveness of our word-character fine-grained gating approach. Later we show experiments on more challenging datasets on reading comprehension to further show that our approach can be used to improve the performance on high-level NLP tasks as well. |
| Researcher Affiliation | Academia | Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov School of Computer Science Carnegie Mellon University EMAIL |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/kimiyoung/fg-gating |
| Open Datasets | Yes | The Twitter dataset consists of English tweets with at least one hashtag from Twitter... The Children s Book Test (CBT) dataset is built from children s books (Hill et al., 2016). ... The Stanford Question Answering Dataset (SQu AD) is a reading comprehension dataset collected recently (Rajpurkar et al., 2016). |
| Dataset Splits | Yes | The Twitter dataset contains 2 million tweets for training, 10K for validation and 50K for testing... The Children s Book Test dataset has 669,343 questions for training, 8,000 for validation and 10,000 for testing... The Stanford Question Answering Dataset (SQu AD) is partitioned into a training set (80%, 87,636 question-answer pairs), a development set (10%, 10,600 question-answer pairs)... |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For the fine-grained gating approach, we use the same hyper-parameters as in Dhingra et al. (2016a) except that we use a character-level GRU with 100 units to be of the same size as the word lookup table. The word embeddings are updated during training. |