reproducibilityindex.ai

Generative Question Answering: Learning to Answer the Whole Question

Authors: Mike Lewis, Angela Fan

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 EXPERIMENTS, Table 1: Exact Match (EM) and F1 on SQUAD, comparing to the best published single models at the time of submission (September 2018).
Researcher Affiliation	Industry	Mike Lewis & Angela Fan Facebook AI Research {mikelewis,angelafan}@fb.com
Pseudocode	No	No pseudocode or algorithm blocks found in the paper.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluate our model (GQA) on the SQUAD dataset to test its robustness to diverse syntactic and lexical inferences. Results are shown in Table 1... We evaluate the ability of our model to perform multihop reasoning on the CLEVR dataset, which consists of images paired with automatically generated questions involving that test visual reasoning.
Dataset Splits	Yes	A correct answer is contained in the beam for over 98.5% of validation questions, suggesting that approximate inference is not a major cause of errors. and The validation set is created from questions whose answers are the named entity type, but there must be multiple occurrences of that type in the document.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions models like ELMo and ResNet-101, but does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions like PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	Hyperparameters and training details are fully described in Appendix A. (Section 2.1) For example: The encoder contains 2 answer-independent LSTM layers and 3 answer-dependent LSTM layers, all of hidden size 128. The decoder contains 9 blocks, all with hidden size d = 256. We apply dropout (p = 0.55)... We train generatively with batches of 10 documents, using a cosine learning rate schedule with a period of 1 epoch, warming up over the ﬁrst 5 epochs to a maximum learning rate of 10 4. During ﬁne-tuning... Fine tuning uses stochastic gradient descent with single question batches, learning rate 5 10 5, and momentum 0.97.