reproducibilityindex.ai

Why think step by step? Reasoning emerges from the locality of experience

Authors: Ben Prystawski, Michael Li, Noah Goodman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then test our hypothesis experimentally in more complex models, training an autoregressive language model on samples from Bayes nets but only including a subset of variables in each sample. We test language models ability to match conditional probabilities with and without intermediate reasoning steps, finding that intermediate steps are only helpful when the training data is locally structured with respect to dependencies between variables.
Researcher Affiliation	Academia	Ben Prystawski Department of Psychology Stanford University Stanford, CA 94305 benpry@stanford.edu Michael Y. Li Department of Computer Science Stanford University Stanford, CA 94305 michaelyli@stanford.edu Noah D. Goodman Departments of Psychology and Computer Science Stanford University Stanford, CA 94305 ngoodman@stanford.edu
Pseudocode	Yes	Pseudocode for Bayes net generation is shown in Algorithm 1 of Appendix B. ... Pseudocode for selecting a subset of variables from an observation distribution is shown in Algorithm 2 in Appendix B.
Open Source Code	Yes	Code and data are available at https://github.com/benpry/why-think-step-by-step.
Open Datasets	Yes	For each of the 10 selected Bayes nets, we generate a training set consisting of 1 million samples formatted as strings. ... Code and data are available at https://github.com/benpry/why-think-step-by-step.
Dataset Splits	No	The paper describes generating training data and testing on 'held-out pairs' but does not explicitly define distinct training, validation, and test splits with specific percentages or counts.
Hardware Specification	Yes	All models were trained on Nvidia Titan Xp GPUs.
Software Dependencies	No	The paper mentions software like the 'Hugging Face transformers library' and 'Adam optimizer', but it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'Hugging Face Transformers v4.x.x').
Experiment Setup	Yes	Our model has 512-dimensional embeddings, 10 layers, and 8 attention heads. ... We trained this architecture with randomly initialized weights for 300, 000 gradient steps on batches containing 3, 072 tokens each, for a total of 921, 600, 000 tokens of training. We trained models using the Adam optimizer [23]. Each model s training set consisted of 1, 000, 000 samples from a single Bayes net. ... an initial learning rate of 10^-3 and Beta values of 0.9 and 0.999.