Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle

Authors: Shangzi Xue, Zhenya Huang, Jiayu Liu, Xin Lin, Yuting Ning, Binbin Jin, Xin Li, Qi Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on three reasoning benchmarks, including Science QA, Strategy QA, and GSM8K, which cover a variety of reasoning tasks, demonstrating that our approach significantly reduces logical errors and enhances performance across various LLMs.
Researcher Affiliation Academia 1: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center {xueshangzi,jy251198,linx,ningyt,bb0725}@mail.ustc.edu.cn; {huangzhy,leexin,qiliuql}@ustc.edu.cn
Pseudocode Yes Algorithm 1 Decompose-Analyze-Rethink
Open Source Code Yes Our code is available at: https://github.com/Shangzi Xue/De AR
Open Datasets Yes We employ the Science QA [28] dataset for the knowledge reasoning task. And we use Strategy QA [12] for logical reasoning that requires multiple reasoning steps. We also verify the mathematical reasoning ability of our framework by applying it to GSM8K dataset [8].
Dataset Splits Yes For each dataset, we randomly sample 10% of its training set as a validation set to select different combinations of thresholds ϵ1 and ϵ2.
Hardware Specification No The paper mentions using GPT-3.5, LLaMA2-7B, and ChatGLM3-6B as LLM backbones but does not specify the hardware (e.g., specific GPU models or CPU types) on which these models were run for their experiments.
Software Dependencies No The paper mentions using specific LLM backbones like GPT-3.5, LLaMA2-7B, and ChatGLM3-6B, and accessing Open AI API. However, it does not specify software dependencies like Python versions, specific library versions (e.g., PyTorch, TensorFlow), or CUDA versions with numerical identifiers.
Experiment Setup Yes To ensure computational efficiency, we set the maximum depth to 4 and the maximum number of branches to 3 during the construction of the reasoning tree in De AR. For each dataset, we randomly sample 10% of its training set as a validation set to select different combinations of thresholds ϵ1 and ϵ2.