Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle
Authors: Shangzi Xue, Zhenya Huang, Jiayu Liu, Xin Lin, Yuting Ning, Binbin Jin, Xin Li, Qi Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on three reasoning benchmarks, including Science QA, Strategy QA, and GSM8K, which cover a variety of reasoning tasks, demonstrating that our approach significantly reduces logical errors and enhances performance across various LLMs. |
| Researcher Affiliation | Academia | 1: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center {xueshangzi,jy251198,linx,ningyt,bb0725}@mail.ustc.edu.cn; {huangzhy,leexin,qiliuql}@ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1 Decompose-Analyze-Rethink |
| Open Source Code | Yes | Our code is available at: https://github.com/Shangzi Xue/De AR |
| Open Datasets | Yes | We employ the Science QA [28] dataset for the knowledge reasoning task. And we use Strategy QA [12] for logical reasoning that requires multiple reasoning steps. We also verify the mathematical reasoning ability of our framework by applying it to GSM8K dataset [8]. |
| Dataset Splits | Yes | For each dataset, we randomly sample 10% of its training set as a validation set to select different combinations of thresholds ϵ1 and ϵ2. |
| Hardware Specification | No | The paper mentions using GPT-3.5, LLaMA2-7B, and ChatGLM3-6B as LLM backbones but does not specify the hardware (e.g., specific GPU models or CPU types) on which these models were run for their experiments. |
| Software Dependencies | No | The paper mentions using specific LLM backbones like GPT-3.5, LLaMA2-7B, and ChatGLM3-6B, and accessing Open AI API. However, it does not specify software dependencies like Python versions, specific library versions (e.g., PyTorch, TensorFlow), or CUDA versions with numerical identifiers. |
| Experiment Setup | Yes | To ensure computational efficiency, we set the maximum depth to 4 and the maximum number of branches to 3 during the construction of the reasoning tree in De AR. For each dataset, we randomly sample 10% of its training set as a validation set to select different combinations of thresholds ϵ1 and ϵ2. |