Get an A in Math: Progressive Rectification Prompting

Authors: Zhenyu Wu, Meng Jiang, Chao Shen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023).
Researcher Affiliation Academia 1School of Cyber Science and Engineering, Xi an Jiaotong University 2Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556
Pseudocode No The paper includes 'Figure 1: Overview of Progressive Rectification Prompting (PRP) method' which is a visual diagram illustrating the process, but it does not provide formal pseudocode or an algorithm block.
Open Source Code Yes Our implementation is made publicly available at https://wzy6642.github.io/prp.github.io/.
Open Datasets Yes We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023).
Dataset Splits No The paper lists datasets used but does not explicitly provide details about training, validation, or test data splits with percentages, sample counts, or specific predefined split references necessary for reproducibility.
Hardware Specification No The paper mentions using 'text-davinci-003 as the backend large language model' and comparing it with 'text-davinci-002', which are specific LLM models (software). It also mentions 'public APIs' for these models, implying cloud usage, but does not specify any underlying hardware like GPU models, CPU types, or specific cloud instance configurations.
Software Dependencies Yes We use text-davinci-003 as the backend large language model, which is one of the most widely-used LLMs with public APIs. The few-shot baselines, including Manual-Co T (Wei et al. 2022), Auto-Co T (Zhang et al. 2023b), and PHP-Co T (Zheng et al. 2023) employ demonstration examples as suggested in the original papers. We set the temperature to 0.7 and set M to 10 for the SC experiments.
Experiment Setup Yes We set the temperature to 0.7 and set M to 10 for the SC experiments. We set the maximum iteration number K to 5.