Get an A in Math: Progressive Rectification Prompting
Authors: Zhenyu Wu, Meng Jiang, Chao Shen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023). |
| Researcher Affiliation | Academia | 1School of Cyber Science and Engineering, Xi an Jiaotong University 2Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556 |
| Pseudocode | No | The paper includes 'Figure 1: Overview of Progressive Rectification Prompting (PRP) method' which is a visual diagram illustrating the process, but it does not provide formal pseudocode or an algorithm block. |
| Open Source Code | Yes | Our implementation is made publicly available at https://wzy6642.github.io/prp.github.io/. |
| Open Datasets | Yes | We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023). |
| Dataset Splits | No | The paper lists datasets used but does not explicitly provide details about training, validation, or test data splits with percentages, sample counts, or specific predefined split references necessary for reproducibility. |
| Hardware Specification | No | The paper mentions using 'text-davinci-003 as the backend large language model' and comparing it with 'text-davinci-002', which are specific LLM models (software). It also mentions 'public APIs' for these models, implying cloud usage, but does not specify any underlying hardware like GPU models, CPU types, or specific cloud instance configurations. |
| Software Dependencies | Yes | We use text-davinci-003 as the backend large language model, which is one of the most widely-used LLMs with public APIs. The few-shot baselines, including Manual-Co T (Wei et al. 2022), Auto-Co T (Zhang et al. 2023b), and PHP-Co T (Zheng et al. 2023) employ demonstration examples as suggested in the original papers. We set the temperature to 0.7 and set M to 10 for the SC experiments. |
| Experiment Setup | Yes | We set the temperature to 0.7 and set M to 10 for the SC experiments. We set the maximum iteration number K to 5. |