Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

Authors: Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train our models by automatically generating examples that illustrate the expected types of inference. and We evaluate our model in three different setups: and Table 1: Test set results for reasoning over hypernymy and meronymy relations. The models learn to reason with implicit rules, significantly improving on the hypothesis-only baseline, some in zero-shot.
Researcher Affiliation Collaboration Alon Talmor1,2 Oyvind Tafjord1 Peter Clark1 Yoav Goldberg1,3 Jonathan Berant1,2 1The Allen Institute for AI 2Tel-Aviv University, 3Bar-Ilan University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes All our code and data is publicly available at http://github.com/alontalmor/Leap Of Thought.
Open Datasets Yes Our primary motivation is to develop models that work in an open domain environment with real world facts. Thus, we automatically generate data by sampling from existing knowledge sources: CONCEPTNET [12], WORDNET [13] and WIKIDATA [14]. and fine-tune ROBERTA [8], on binary (yes/no) question answering tasks from two datasets (using standard multi-task training): (a) 50K examples from TWENTY QUESTIONS (20Q),1 a question answering (QA) dataset which includes questions such as Does an aircraft fly? (true) and Do everyone have an alarm? (false). This teaches the model to retrieve real world facts from its internal implicit knowledge; and (b) 100K examples from the RULETAKER [4] reasoning dataset, teaching the model to reason over a set of assertions explicitly provided as natural language statements. 1https://github.com/allenai/twentyquestions
Dataset Splits Yes We generate 30,906 training examples using this procedure. We create development and test sets, 1,289 examples each, where the subjects and objects are disjoint from the training set. and Overall 38,700/3,005/3,005 training/development/test examples were created.
Hardware Specification No The paper does not provide specific hardware details (like GPU or CPU models, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions models like ROBERTA and ESIM, but does not provide specific version numbers for any software dependencies or libraries used for the experiments.
Experiment Setup No The paper describes the input format and loss function used ('binary cross-entropy loss'), but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.