RECKONING: Reasoning through Dynamic Knowledge Encoding
Authors: Zeming Chen, Gail Weiss, Eric Mitchell, Asli Celikyilmaz, Antoine Bosselut
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on three diverse multi-hop reasoning datasets show that RECKONING s performance improves over the in-context reasoning baseline (by up to 4.5%). We also find that compared to in-context reasoning, RECKONING generalizes better to longer reasoning chains unseen during training, is more robust to distractors in the context, and is computationally more efficient when multiple questions are asked about the same knowledge. |
| Researcher Affiliation | Collaboration | Zeming Chen1 Gail Weiss1 Eric Mitchell2 Asli Celikyilmaz3 Antoine Bosselut1 EPFL1 Stanford University2 Meta AI Research3 |
| Pseudocode | Yes | Algorithm 1 RECKONING; Algorithm 2 Dynamic Knowledge Encoding for Reasoning |
| Open Source Code | No | The paper mentions using the Huggingface Transformers library [79] but does not provide a link or explicit statement about releasing the source code for their own method. |
| Open Datasets | Yes | We conduct our experiments on three datasets focusing on multi-hop logical reasoning over natural language knowledge: Proof Writer [74], which measures the model s ability to emulate reasoning over facts and rules expressed in natural language; CLUTRR-SG [29], which is generated from the CLUTRR [72] benchmark...; and FOLIO [30], a reasoning benchmark with first-order logical reasoning problems... |
| Dataset Splits | Yes | Table 6: Dataset splits and statistics for our experiments CLUTRR-SG (2-hop) #Train 96,012 #Validation 10,972 #Test 3,102; Proof Writer (5-hop) #Train 18,525 #Validation 2,553 #Test 5,175 |
| Hardware Specification | Yes | All the experiments for RECKONING are conducted on a cluster with NVIDIA A100 (40GB) GPUs. All the baseline experiments are conducted on a local machine with NVIDIA RTX 3090 GPU (24GB). |
| Software Dependencies | No | The paper mentions using the Huggingface Transformers library [79] and specifies Python is used implicitly, but it does not provide specific version numbers for these software components to ensure reproducibility (e.g., Python 3.x, Transformers library version x.x.x). |
| Experiment Setup | Yes | We set the train batch size to 16 and train the model for 6 epochs with early stopping... We set the learning rate to 3e-5 and use the Adam W optimizer with ϵ set to 1e-8. ...In the inner loop, we generally perform 4 gradient steps for lower-hop questions (2, 3, 4-hop) and 5 gradient steps for higher-hop questions (5 and 6-hop)... The inner-loop learning rate is set to 3e-5... In the outer loop, we also use the Adam W with a learning rate of 3e-5. For both optimizers, we set ϵ to 1e-8. We set the train batch size to 2 due to memory limitations. We apply the technique of gradient accumulation and set the accumulation step to 2. |