Capturing Greater Context for Question Generation
Authors: Luu Anh Tuan, Darsh Shah, Regina Barzilay9065-9072
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our hypothesis of using a controllable context to generate questions on three different QA datasets SQu AD, MS MARCO, and News QA. Our method strongly outperforms existing state-of-the-art models by an average absolute increase of 1.56 Rouge, 0.97 Meteor and 0.81 Bleu scores over the previous best reported results on all three datasets. |
| Researcher Affiliation | Academia | Luu Anh Tuan, Darsh J Shah, Regina Barzilay Computer Science and Artiļ¬cial Intelligence Lab, MIT {tuanluu, darsh, regina}@csail.mit.edu |
| Pseudocode | No | The paper describes the model architecture and decoding process with equations and textual descriptions, but does not include formal pseudocode or an algorithm block. |
| Open Source Code | Yes | 1Our code and data are available at https://github.com/vivisimo/Question Generation |
| Open Datasets | Yes | We evaluate our model on 3 question answering datasets: SQu AD (Rajpurkar et al. 2016), MS Marco (Bajaj et al. 2016) and News QA (Trischler et al. 2016). These form a comprehensive set of datasets to evaluate question generation. |
| Dataset Splits | Yes | Table 1: Description of the evaluation datasets. l D , l Q and l A stand for average length of document, question and answer respectively. Dataset Train Dev Test l D l Q l A SQu AD-1 87,488 5,267 5,272 126 11 3 SQu AD-2 77,739 9,749 10,540 127 11 3 MS Marco 51,000 6,000 7,000 60 6 15 News QA 76,560 4,341 4,292 583 8 5 |
| Hardware Specification | No | The paper does not mention any specific hardware (GPU/CPU models, memory, etc.) used for training or experiments. |
| Software Dependencies | No | The paper mentions software components like "Bidirectional LSTM", "Adam optimizer", "GloVe vectors", and "NLTK" but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | We use a one-layer Bidirectional LSTM with hidden dimension size of 512 for the encoder and decoder. Our entire model is trained end-to-end, with batch size 64, maximum of 200k steps, and Adam optimizer with a learning rate of 0.001 and L2 regularization set to 10 6. We initialize our word embeddings with frozen pre-trained Glo Ve vectors (Pennington, Socher, and Manning 2014). Text is lowercased and tokenized with NLTK. We tune the step of biattention used in encoder from {1, 2, 3} on the development set. During decoding, we used beam search with the beam size of 10, and stopped decoding when every beam in the stack generates the < EOS > token. |