Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions
Authors: Peter Clark, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Turney, Daniel Khashabi
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the methods on six years of unseen, unedited exam questions from the NY Regents Science Exam (using only non-diagram, multiple choice questions), and show that our overall system s score is 71.3%, an improvement of 23.8% (absolute) over the MLN-based method described in previous work. ... We carry out ablation studies that quantify the contribution of each method to Aristo, and show that all levels of representation help. Our error analysis indicates the complementary strengths and weaknesses of each method, and directions for future work. |
| Researcher Affiliation | Collaboration | Peter Clark, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Turney Allen Institute for Artiļ¬cial Intelligence {peterc,orene,tushark,ashishs,oyvindt,petert}@allenai.org Daniel Khashabi Cognitive Computation Lab (CCG), Univ Illinois, Urbana-Champaign khashab2@illinois.edu |
| Pseudocode | No | The paper describes methods and algorithms in prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'Our datasets are being released to enable further research.' and 'We are releasing our datasets (at www.allenai.org) to encourage such research.' but does not explicitly state that the code for the described methodology is being released as open source. |
| Open Datasets | Yes | Our datasets are being released to enable further research. ... We are releasing our datasets (at www.allenai.org) to encourage such research. |
| Dataset Splits | No | The paper states '6 years of exams (108 NDMC questions) for training and 6 years (129 NDMC questions) for testing,' but does not explicitly mention a separate validation split with details. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Lucene' and 'SCIP' and cites 'SCIP (Achterberg 2009)', but does not provide explicit version numbers for these or other software dependencies within the text. |
| Experiment Setup | No | The paper describes the model architecture and training of the combiner, but does not provide specific hyperparameters or detailed training configurations for the experimental setup. |