reproducibilityindex.ai

LLMs Can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

Authors: Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines.
Researcher Affiliation	Collaboration	1Shanghai Business School, Shanghai, China 2Learnable.AI Inc., Shanghai, China 3Centre for Frontier AI Research, ASTAR, Singapore 4Institute of High-Performance Computing, ASTAR, Singapore 5The Hong Kong Polytechnic University, Hong Kong, China 6Microsoft Research Asia, Shanghai, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Accessible on github.com/Haoyuan Peng/Ped Co T-IJCAI24/
Open Datasets	Yes	We collect two public datasets containing step-level correctness labels for mathematical problems with different difficulties. BIG-Bench Mistake [Tyen et al., 2023]: PRM800K [Lightman et al., 2023]:
Dataset Splits	No	The paper describes the datasets and their selection for experiments, but does not explicitly provide details about specific training/validation/test splits (e.g., percentages or counts for each split) used for reproducibility.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software dependencies (e.g., libraries, frameworks, or programming languages beyond the general mention of LLMs).
Experiment Setup	Yes	The temperature for generation is consistently set to 0 for both models to minimize the diversity of model outputs.