Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision

Authors: Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi12592-12601

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales. We investigate multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and train generative models capable of composing explanatory rationales for unseen instances. We demonstrate our approach on the defeasible inference task, a nonmonotonic reasoning task in which an inference may be strengthened or weakened when new information (an update) is introduced.
Researcher Affiliation Collaboration 1University of California Santa Cruz 2Allen Institute for AI 3Paul G. Allen School of Computer Science & Engineering, University of Washington 4University of Maryland, College Park, MD
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper. The methodology is described in natural language.
Open Source Code Yes The code and data are available at: https://github.com/ fabrahman/Rationale Gen.
Open Datasets Yes We focus on the Stanford Natural Language Inference dataset (SNLI; Bowman et al. 2015), in which image captions serve as premises, and hypotheses were crowdsourced." and "The dataset for these tasks was built by crowdsourcing update sentences for neutral sentence-pairs from existing NLI datasets, Specifically, we use the SNLI portion of their data." and "We augmented the δ-NLI dataset described in with rationales that explain why a hypothesis is more likely after learning about a strengthener update and less likely after learning about a weakener. Rather than eliciting rationales from humans, we take a distant supervision approach and gather rationales from various sources, as exemplified in Table 1.
Dataset Splits Yes We follow the original split to train (80%), test (10%), and development (10%) sets. By augmenting the data with multiple rationales per original δ-NLI instance, the final eδ-NLI dataset consists of 731,579 training, 15,781 test, and 15,527 development instances.
Hardware Specification Yes training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine.
Software Dependencies No The paper mentions software like 'SpaCy' and 'Transformers package' and models like 'GPT2-XL' and 'Bart-L'. However, it does not provide specific version numbers for these software dependencies (e.g., 'Transformers package vX.Y' or 'SpaCy vX.Y') to ensure reproducibility of the environment.
Experiment Setup Yes We fine-tune transformer-based pre-trained LMs on the e-δNLI dataset. Specifically, we use GPT2-XL (Radford et al. 2019) and Bart-L (Lewis et al. 2020). We use the Transformers package (Wolf et al. 2019), training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine.