reproducibilityindex.ai

Capturing Greater Context for Question Generation

Authors: Luu Anh Tuan, Darsh Shah, Regina Barzilay9065-9072

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our hypothesis of using a controllable context to generate questions on three different QA datasets SQu AD, MS MARCO, and News QA. Our method strongly outperforms existing state-of-the-art models by an average absolute increase of 1.56 Rouge, 0.97 Meteor and 0.81 Bleu scores over the previous best reported results on all three datasets.
Researcher Affiliation	Academia	Luu Anh Tuan, Darsh J Shah, Regina Barzilay Computer Science and Artiﬁcial Intelligence Lab, MIT {tuanluu, darsh, regina}@csail.mit.edu
Pseudocode	No	The paper describes the model architecture and decoding process with equations and textual descriptions, but does not include formal pseudocode or an algorithm block.
Open Source Code	Yes	1Our code and data are available at https://github.com/vivisimo/Question Generation
Open Datasets	Yes	We evaluate our model on 3 question answering datasets: SQu AD (Rajpurkar et al. 2016), MS Marco (Bajaj et al. 2016) and News QA (Trischler et al. 2016). These form a comprehensive set of datasets to evaluate question generation.
Dataset Splits	Yes	Table 1: Description of the evaluation datasets. l D , l Q and l A stand for average length of document, question and answer respectively. Dataset Train Dev Test l D l Q l A SQu AD-1 87,488 5,267 5,272 126 11 3 SQu AD-2 77,739 9,749 10,540 127 11 3 MS Marco 51,000 6,000 7,000 60 6 15 News QA 76,560 4,341 4,292 583 8 5
Hardware Specification	No	The paper does not mention any specific hardware (GPU/CPU models, memory, etc.) used for training or experiments.
Software Dependencies	No	The paper mentions software components like "Bidirectional LSTM", "Adam optimizer", "GloVe vectors", and "NLTK" but does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	We use a one-layer Bidirectional LSTM with hidden dimension size of 512 for the encoder and decoder. Our entire model is trained end-to-end, with batch size 64, maximum of 200k steps, and Adam optimizer with a learning rate of 0.001 and L2 regularization set to 10 6. We initialize our word embeddings with frozen pre-trained Glo Ve vectors (Pennington, Socher, and Manning 2014). Text is lowercased and tokenized with NLTK. We tune the step of biattention used in encoder from {1, 2, 3} on the development set. During decoding, we used beam search with the beam size of 10, and stopped decoding when every beam in the stack generates the < EOS > token.