RECKONING: Reasoning through Dynamic Knowledge Encoding

Authors: Zeming Chen, Gail Weiss, Eric Mitchell, Asli Celikyilmaz, Antoine Bosselut

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on three diverse multi-hop reasoning datasets show that RECKONING s performance improves over the in-context reasoning baseline (by up to 4.5%). We also find that compared to in-context reasoning, RECKONING generalizes better to longer reasoning chains unseen during training, is more robust to distractors in the context, and is computationally more efficient when multiple questions are asked about the same knowledge.
Researcher Affiliation Collaboration Zeming Chen1 Gail Weiss1 Eric Mitchell2 Asli Celikyilmaz3 Antoine Bosselut1 EPFL1 Stanford University2 Meta AI Research3
Pseudocode Yes Algorithm 1 RECKONING; Algorithm 2 Dynamic Knowledge Encoding for Reasoning
Open Source Code No The paper mentions using the Huggingface Transformers library [79] but does not provide a link or explicit statement about releasing the source code for their own method.
Open Datasets Yes We conduct our experiments on three datasets focusing on multi-hop logical reasoning over natural language knowledge: Proof Writer [74], which measures the model s ability to emulate reasoning over facts and rules expressed in natural language; CLUTRR-SG [29], which is generated from the CLUTRR [72] benchmark...; and FOLIO [30], a reasoning benchmark with first-order logical reasoning problems...
Dataset Splits Yes Table 6: Dataset splits and statistics for our experiments CLUTRR-SG (2-hop) #Train 96,012 #Validation 10,972 #Test 3,102; Proof Writer (5-hop) #Train 18,525 #Validation 2,553 #Test 5,175
Hardware Specification Yes All the experiments for RECKONING are conducted on a cluster with NVIDIA A100 (40GB) GPUs. All the baseline experiments are conducted on a local machine with NVIDIA RTX 3090 GPU (24GB).
Software Dependencies No The paper mentions using the Huggingface Transformers library [79] and specifies Python is used implicitly, but it does not provide specific version numbers for these software components to ensure reproducibility (e.g., Python 3.x, Transformers library version x.x.x).
Experiment Setup Yes We set the train batch size to 16 and train the model for 6 epochs with early stopping... We set the learning rate to 3e-5 and use the Adam W optimizer with ϵ set to 1e-8. ...In the inner loop, we generally perform 4 gradient steps for lower-hop questions (2, 3, 4-hop) and 5 gradient steps for higher-hop questions (5 and 6-hop)... The inner-loop learning rate is set to 3e-5... In the outer loop, we also use the Adam W with a learning rate of 3e-5. For both optimizers, we set ϵ to 1e-8. We set the train batch size to 2 due to memory limitations. We apply the technique of gradient accumulation and set the accumulation step to 2.