Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision
Authors: Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi12592-12601
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales. We investigate multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and train generative models capable of composing explanatory rationales for unseen instances. We demonstrate our approach on the defeasible inference task, a nonmonotonic reasoning task in which an inference may be strengthened or weakened when new information (an update) is introduced. |
| Researcher Affiliation | Collaboration | 1University of California Santa Cruz 2Allen Institute for AI 3Paul G. Allen School of Computer Science & Engineering, University of Washington 4University of Maryland, College Park, MD |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. The methodology is described in natural language. |
| Open Source Code | Yes | The code and data are available at: https://github.com/ fabrahman/Rationale Gen. |
| Open Datasets | Yes | We focus on the Stanford Natural Language Inference dataset (SNLI; Bowman et al. 2015), in which image captions serve as premises, and hypotheses were crowdsourced." and "The dataset for these tasks was built by crowdsourcing update sentences for neutral sentence-pairs from existing NLI datasets, Specifically, we use the SNLI portion of their data." and "We augmented the δ-NLI dataset described in with rationales that explain why a hypothesis is more likely after learning about a strengthener update and less likely after learning about a weakener. Rather than eliciting rationales from humans, we take a distant supervision approach and gather rationales from various sources, as exemplified in Table 1. |
| Dataset Splits | Yes | We follow the original split to train (80%), test (10%), and development (10%) sets. By augmenting the data with multiple rationales per original δ-NLI instance, the final eδ-NLI dataset consists of 731,579 training, 15,781 test, and 15,527 development instances. |
| Hardware Specification | Yes | training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine. |
| Software Dependencies | No | The paper mentions software like 'SpaCy' and 'Transformers package' and models like 'GPT2-XL' and 'Bart-L'. However, it does not provide specific version numbers for these software dependencies (e.g., 'Transformers package vX.Y' or 'SpaCy vX.Y') to ensure reproducibility of the environment. |
| Experiment Setup | Yes | We fine-tune transformer-based pre-trained LMs on the e-δNLI dataset. Specifically, we use GPT2-XL (Radford et al. 2019) and Bart-L (Lewis et al. 2020). We use the Transformers package (Wolf et al. 2019), training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine. |