Abductive Commonsense Reasoning
Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, Yejin Choi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. |
| Researcher Affiliation | Collaboration | Allen Institute for AI, Seattle, WA, USA, Facebook AI, Seattle, WA, USA Paul G. Allen School of Computer Science & Engineering, WA, USA {chandrab,ronanlb,chaitanyam,keisukes}@allenai.org {arih,hannahr,dougd}@allenai.org {yejin}@cs.washington.edu {scottyih}@fb.com |
| Pseudocode | Yes | Algorithm 1 provides a formal description of our approach. In each iteration i, we train an adversarial model Mi on a random subset Ti of the data and update the validation set Vi to make it more challenging for Mi. |
| Open Source Code | No | The paper states: "We will publicly release the ART dataset upon acceptance." and "Along with the ART dataset, we will publicly release templates and the full set of instructions for all crowdsourcing tasks to facilitate future data collection and research in this direction." This is a promise for future release of data and templates, not current, concrete access to the source code for the methodology (e.g., the fine-tuning scripts for BERT or GPT). |
| Open Datasets | Yes | ART is the first large-scale benchmark dataset for studying abductive reasoning in narrative texts. It consists of 20K narrative contexts (pairs of observations O1, O2 ) with over 200K explanatory hypotheses. Table 6 in the Appendix summarizes corpus-level statistics of the ART dataset.5 We will publicly release the ART dataset upon acceptance.4Data available to download at http://abductivecommonsense.xyz |
| Dataset Splits | Yes | Table 6: Some statistics summarizing the ART dataset. The train set includes all plausible and implausible hypotheses collected via crowdsourcing, while the dev and test sets include the hypotheses selected through the Adversarial Filtering algorithm. |
| Hardware Specification | No | The paper mentions "Computations on beaker.org were supported in part by credits from Google Cloud." in the acknowledgments, but it does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper mentions software like BERT, GPT, GPT2, and COMeT, but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We fine-tuned the BERT model using a grid search with the following set of hyper-parameters: batch size: {3, 4, 8} number of epochs: {3, 4, 10} learning rate: {1e-5, 2e-5, 3e-5, 5e-5} The warmup proportion was set to 0.2, and cross-entropy was used for computing the loss. |