Learning to Reason and Memorize with Self-Notes

Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, arthur szlam, Sainbayar Sukhbaatar

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across a wide variety of tasks demonstrate that our method can outperform chain-of-thought and scratchpad methods by taking Self-Notes that interleave the input text.
Researcher Affiliation Industry Jack Lanchantin Meta AI Shubham Toshniwal NVIDIA Jason Weston Meta AI Arthur Szlam Meta AI Sainbayar Sukhbaatar Meta AI
Pseudocode No The paper describes its methods verbally and with examples but does not include any structured pseudocode or algorithm blocks.
Open Source Code No Reproducibility statement: We will make code and data publicly available.
Open Datasets Yes We test our method on seven text datasets designed to evaluate multi-step reasoning and state-tracking: a proposed synthetic Toy-Story task, two synthetic program evaluation tasks [11, 16], two real-world chess game tasks [17], and two math word problem tasks previously used to test chain-of-thought prompting, Multi Arith and GSM8K [18, 19].
Dataset Splits Yes Table 8: Dataset Statistics. # train # valid # test In domain Out-of domain
Hardware Specification Yes We fine-tune all of the GPT-2 models on 8 NVIDIA V100 GPUs using an on-site cluster.
Software Dependencies Yes The GSM8K experiments were done using the text-davinci-003 model with the Open AI API
Experiment Setup Yes For each non-prompting task, we train for a fixed 30 epochs with a learning rate of 2e-5 and batch size of 32.