Abductive Commonsense Reasoning

Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, Yejin Choi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans.
Researcher Affiliation Collaboration Allen Institute for AI, Seattle, WA, USA, Facebook AI, Seattle, WA, USA Paul G. Allen School of Computer Science & Engineering, WA, USA {chandrab,ronanlb,chaitanyam,keisukes}@allenai.org {arih,hannahr,dougd}@allenai.org {yejin}@cs.washington.edu {scottyih}@fb.com
Pseudocode Yes Algorithm 1 provides a formal description of our approach. In each iteration i, we train an adversarial model Mi on a random subset Ti of the data and update the validation set Vi to make it more challenging for Mi.
Open Source Code No The paper states: "We will publicly release the ART dataset upon acceptance." and "Along with the ART dataset, we will publicly release templates and the full set of instructions for all crowdsourcing tasks to facilitate future data collection and research in this direction." This is a promise for future release of data and templates, not current, concrete access to the source code for the methodology (e.g., the fine-tuning scripts for BERT or GPT).
Open Datasets Yes ART is the first large-scale benchmark dataset for studying abductive reasoning in narrative texts. It consists of 20K narrative contexts (pairs of observations O1, O2 ) with over 200K explanatory hypotheses. Table 6 in the Appendix summarizes corpus-level statistics of the ART dataset.5 We will publicly release the ART dataset upon acceptance.4Data available to download at http://abductivecommonsense.xyz
Dataset Splits Yes Table 6: Some statistics summarizing the ART dataset. The train set includes all plausible and implausible hypotheses collected via crowdsourcing, while the dev and test sets include the hypotheses selected through the Adversarial Filtering algorithm.
Hardware Specification No The paper mentions "Computations on beaker.org were supported in part by credits from Google Cloud." in the acknowledgments, but it does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies No The paper mentions software like BERT, GPT, GPT2, and COMeT, but it does not provide specific version numbers for these or any other software dependencies required to replicate the experiments.
Experiment Setup Yes We fine-tuned the BERT model using a grid search with the following set of hyper-parameters: batch size: {3, 4, 8} number of epochs: {3, 4, 10} learning rate: {1e-5, 2e-5, 3e-5, 5e-5} The warmup proportion was set to 0.2, and cross-entropy was used for computing the loss.