QASC: A Dataset for Question Answering via Sentence Composition
Authors: Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, Ashish Sabharwal8082-8090
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance. Table 5: QASC scores for previous state-of-the-art models on multi-hop Science MCQ(OBQA), and BERT models with different corpora, retrieval approaches and additional fine-tuning. |
| Researcher Affiliation | Collaboration | Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen,+ Ashish Sabharwal Allen Institute for AI, Seattle, WA, U.S.A. +University of Arizona, Tucson, AZ, U.S.A. {tushark, peterc, michalg, ashishs}@allenai.org, pajansen@email.arizona.edu |
| Pseudocode | No | The paper describes methods verbally but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | 1Questions, annotated facts, and corpora are available at https://github.com/allenai/qasc. The link provides access to the dataset and facts, but not explicitly the source code for the described methodologies (e.g., retrieval approach, adversarial choice selection). |
| Open Datasets | Yes | We propose a novel dataset, Question Answering via Sentence Composition (QASC; pronounced kask) of 9,980 multi-hop multiple-choice questions (MCQs) where simple syntactic cues are insufficient to determine how to decompose the question into simpler queries. 1Questions, annotated facts, and corpora are available at https://github.com/allenai/qasc. |
| Dataset Splits | Yes | To enable fine-tuning models, we split the questions them into 5962/825/873 questions in train/dev/test folds, resp. |
| Hardware Specification | No | The paper states 'Computations on beaker.org were supported in part by credits from Google Cloud' but does not provide specific hardware details such as GPU models, CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software like BERT, spaCy, langdetect, and ftfy, but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | No | For consistency, we use the same hyper-parameter sweep in all finetuning experiments (cf. Appendix D). The paper refers to an appendix for hyperparameter details, which is not provided in the current text. |