Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning

Authors: Subhabrata Dutta, Joykirat Singh, Soumen Chakrabarti, Tanmoy Chakraborty

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work investigates the neural sub-structures within LLMs that manifest Co T reasoning from a mechanistic point of view. From an analysis of Llama-2 7B applied to multistep reasoning over fictional ontologies, we demonstrate that LLMs deploy multiple parallel pathways of answer generation for step-by-step reasoning. These parallel pathways provide sequential answers from the input question context as well as the generated Co T. Our findings supply empirical answers to a pertinent open question about whether LLMs actually rely on Co T to answer questions (Tan, 2023; Lampinen et al., 2022).
Researcher Affiliation Academia Subhabrata Dutta EMAIL IIT Delhi, India Joykirat Singh EMAIL Independent Soumen Chakrabarti EMAIL IIT Bombay, India Tanmoy Chakraborty EMAIL IIT Delhi, India
Pseudocode No The paper describes methods and procedures in prose, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code and data are made available at https://github.com/joykirat18/How-To-Think-Step-by-Step.
Open Datasets Yes To minimize the effects of MLP blocks and focus primarily on reasoning from the provided context, we make use of the Pr Onto QA dataset (Saparov & He, 2023) that employs ontology-based question answering using fictional entities (see Figure 1 for an example).
Dataset Splits Yes Total training pairs: 28392; Total testing pairs: 9204. All three types of pairs (positively and negatively related and unrelated) are present in equal proportion in the training and testing data.
Hardware Specification No The paper mentions using 'Llama-2 7B' as the model, but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Llama-2 7B' but does not specify any other software dependencies or their version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes We use 6-shot examples of Co T for generation in all the experiments. ... 4-layer MLP model, 4096 * 2 -> 128 -> 64 -> 32 -> 3. With ReLU in between each Linear layer. Learning rate: 0.00005 Number of epochs: 120.