reproducibilityindex.ai

Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision

Authors: Faeze Brahman, Vered Shwartz, Rachel Rudinger, Yejin Choi12592-12601

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we investigate the extent to which neural models can reason about natural language rationales that explain model predictions, relying only on distant supervision with no additional annotation cost for human-written rationales. We investigate multiple ways to automatically generate rationales using pre-trained language models, neural knowledge models, and distant supervision from related tasks, and train generative models capable of composing explanatory rationales for unseen instances. We demonstrate our approach on the defeasible inference task, a nonmonotonic reasoning task in which an inference may be strengthened or weakened when new information (an update) is introduced.
Researcher Affiliation	Collaboration	1University of California Santa Cruz 2Allen Institute for AI 3Paul G. Allen School of Computer Science & Engineering, University of Washington 4University of Maryland, College Park, MD
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper. The methodology is described in natural language.
Open Source Code	Yes	The code and data are available at: https://github.com/ fabrahman/Rationale Gen.
Open Datasets	Yes	We focus on the Stanford Natural Language Inference dataset (SNLI; Bowman et al. 2015), in which image captions serve as premises, and hypotheses were crowdsourced." and "The dataset for these tasks was built by crowdsourcing update sentences for neutral sentence-pairs from existing NLI datasets, Speciﬁcally, we use the SNLI portion of their data." and "We augmented the δ-NLI dataset described in with rationales that explain why a hypothesis is more likely after learning about a strengthener update and less likely after learning about a weakener. Rather than eliciting rationales from humans, we take a distant supervision approach and gather rationales from various sources, as exempliﬁed in Table 1.
Dataset Splits	Yes	We follow the original split to train (80%), test (10%), and development (10%) sets. By augmenting the data with multiple rationales per original δ-NLI instance, the ﬁnal eδ-NLI dataset consists of 731,579 training, 15,781 test, and 15,527 development instances.
Hardware Specification	Yes	training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine.
Software Dependencies	No	The paper mentions software like 'SpaCy' and 'Transformers package' and models like 'GPT2-XL' and 'Bart-L'. However, it does not provide specific version numbers for these software dependencies (e.g., 'Transformers package vX.Y' or 'SpaCy vX.Y') to ensure reproducibility of the environment.
Experiment Setup	Yes	We ﬁne-tune transformer-based pre-trained LMs on the e-δNLI dataset. Speciﬁcally, we use GPT2-XL (Radford et al. 2019) and Bart-L (Lewis et al. 2020). We use the Transformers package (Wolf et al. 2019), training each model for a single epoch with batch size of 8 (GPT2), and 128 (Bart) on a Quadro RTX 8000 GPU machine.