AlphaMath Almost Zero: Process Supervision without Process

Authors: Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on both in-domain and out-of-domain datasets demonstrate that even without GPT-4 or human-annotated process supervision, our Alpha Math framework achieves comparable or superior results to previous state-of-the-art methods.
Researcher Affiliation Collaboration Guoxin Chen , Minpeng Liao , Chengxi Li , Kai Fan Tongyi Lab chenguoxin22@mails.ucas.ac.cn {minpeng.lmp,xiji.lcx,k.fan}@alibaba-inc.com
Pseudocode Yes Algorithm 1: Inference with MCTS; Algorithm 2: Step-level Beam Search
Open Source Code Yes Code: https://github.com/MARIO-Math-Reasoning/Super_MARIO
Open Datasets Yes For the training sets, we exclusively extract question and answer pairs from GSM8K [7] and MATH [15], omitting the human-annotated solution analysis.
Dataset Splits No The paper specifies training and test sets but does not explicitly mention or provide details for a separate validation dataset split.
Hardware Specification Yes All experiments were conducted on Ubuntu 22.04 equipped with 8 * NVIDIA A100 GPUs.
Software Dependencies Yes Our code mainly depends on Python 3.114 and Py Torch 2.1.25. ... We trained all models with Deep Speed Ze RO Stage2 [29] and Flash-Attention 2 [9].
Experiment Setup Yes For supervised fine-tuning, we set the learning rate of 4e-5, batch size of 1024, the weight of the value loss to 0.01 or 0.0005 (for Llama3 [11]), and train the model for 10 epochs. We employ the Adam W optimizer [24] and the cosine learning rate scheduler with the warmup rate set to 0.03. Table 6 provides key hyperparameters of Alpha Math.