reproducibilityindex.ai

Get an A in Math: Progressive Rectification Prompting

Authors: Zhenyu Wu, Meng Jiang, Chao Shen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023).
Researcher Affiliation	Academia	1School of Cyber Science and Engineering, Xi an Jiaotong University 2Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556
Pseudocode	No	The paper includes 'Figure 1: Overview of Progressive Rectification Prompting (PRP) method' which is a visual diagram illustrating the process, but it does not provide formal pseudocode or an algorithm block.
Open Source Code	Yes	Our implementation is made publicly available at https://wzy6642.github.io/prp.github.io/.
Open Datasets	Yes	We conduct comprehensive experiments on eight math word problem datasets, including Add Sub (Hosseini et al. 2014), Single Op (Roy, Vieira, and Roth 2015), Multi Arith (Roy and Roth 2015), Single Eq (Koncel Kedziorski et al. 2015), SVAMP (Patel, Bhattamishra, and Goyal 2021), GSM8K (Cobbe et al. 2021), GSM-IC2-1K (Shi et al. 2023), and GSM-ICM-1K (Shi et al. 2023).
Dataset Splits	No	The paper lists datasets used but does not explicitly provide details about training, validation, or test data splits with percentages, sample counts, or specific predefined split references necessary for reproducibility.
Hardware Specification	No	The paper mentions using 'text-davinci-003 as the backend large language model' and comparing it with 'text-davinci-002', which are specific LLM models (software). It also mentions 'public APIs' for these models, implying cloud usage, but does not specify any underlying hardware like GPU models, CPU types, or specific cloud instance configurations.
Software Dependencies	Yes	We use text-davinci-003 as the backend large language model, which is one of the most widely-used LLMs with public APIs. The few-shot baselines, including Manual-Co T (Wei et al. 2022), Auto-Co T (Zhang et al. 2023b), and PHP-Co T (Zheng et al. 2023) employ demonstration examples as suggested in the original papers. We set the temperature to 0.7 and set M to 10 for the SC experiments.
Experiment Setup	Yes	We set the temperature to 0.7 and set M to 10 for the SC experiments. We set the maximum iteration number K to 5.