reproducibilityindex.ai

Learning to Reason and Memorize with Self-Notes

Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, arthur szlam, Sainbayar Sukhbaatar

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across a wide variety of tasks demonstrate that our method can outperform chain-of-thought and scratchpad methods by taking Self-Notes that interleave the input text.
Researcher Affiliation	Industry	Jack Lanchantin Meta AI Shubham Toshniwal NVIDIA Jason Weston Meta AI Arthur Szlam Meta AI Sainbayar Sukhbaatar Meta AI
Pseudocode	No	The paper describes its methods verbally and with examples but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	Reproducibility statement: We will make code and data publicly available.
Open Datasets	Yes	We test our method on seven text datasets designed to evaluate multi-step reasoning and state-tracking: a proposed synthetic Toy-Story task, two synthetic program evaluation tasks [11, 16], two real-world chess game tasks [17], and two math word problem tasks previously used to test chain-of-thought prompting, Multi Arith and GSM8K [18, 19].
Dataset Splits	Yes	Table 8: Dataset Statistics. # train # valid # test In domain Out-of domain
Hardware Specification	Yes	We fine-tune all of the GPT-2 models on 8 NVIDIA V100 GPUs using an on-site cluster.
Software Dependencies	Yes	The GSM8K experiments were done using the text-davinci-003 model with the Open AI API
Experiment Setup	Yes	For each non-prompting task, we train for a fixed 30 epochs with a learning rate of 2e-5 and batch size of 32.