Teaching Machines to Read and Comprehend

Authors: Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, Phil Blunsom

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure. Our experimental results are in Table 5, with the Attentive and Impatient Readers performing best across both datasets.
Researcher Affiliation Collaboration Google Deep Mind University of Oxford {kmh,tkocisky,etg,lespeholt,wkay,mustafasul,pblunsom}@google.com
Pseudocode No The paper presents mathematical equations for its models (e.g., LSTM cell components, attention calculation) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code to replicate our datasets and to apply this method to other sources is available online3. 3http://www.github.com/deepmind/rc-data/
Open Datasets Yes Using this approach we have collected two new corpora of roughly a million news stories with associated queries from the CNN and Daily Mail websites. Code to replicate our datasets and to apply this method to other sources is available online3. 3http://www.github.com/deepmind/rc-data/
Dataset Splits Yes Table 1: Corpus statistics. Articles were collected starting in April 2007 for CNN and June 2010 for the Daily Mail, both until the end of April 2015. Validation data is from March, test data from April 2015.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only details software aspects and hyperparameters.
Software Dependencies No The paper mentions types of neural networks (e.g., Deep LSTMs, Rms Prop) and libraries/concepts used, but does not provide specific version numbers for software dependencies (e.g., 'PyTorch 1.9', 'TensorFlow 2.x').
Experiment Setup Yes All model hyperparameters were tuned on the respective validation sets of the two corpora.5 5For the Deep LSTM Reader, we consider hidden layer sizes [64, 128, 256], depths [1, 2, 4], initial learning rates [1E 3, 5E 4, 1E 4, 5E 5], batch sizes [16, 32] and dropout [0.0, 0.1, 0.2]. We evaluate two types of feeds. In the cqa setup we feed first the context document and subsequently the question into the encoder, while the qca model starts by feeding in the question followed by the context document. We report results on the best model (underlined hyperparameters, qca setup). For the attention models we consider hidden layer sizes [64, 128, 256], single layer, initial learning rates [1E 4, 5E 5, 2.5E 5, 1E 5], batch sizes [8, 16, 32] and dropout [0, 0.1, 0.2, 0.5]. For all models we used asynchronous Rms Prop [20] with a momentum of 0.9 and a decay of 0.95. See Appendix A for more details of the experimental setup.