AutoPSV: Automated Process-Supervised Verifier

Authors: Jianqiao Lu, Zhiyang Dou, Hongru WANG, Zeyu Cao, Jianbo Dai, Yunlong Feng, Zhijiang Guo

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments across five datasets, including mathematical reasoning benchmarks and commonsense reasoning tasks. The results demonstrate that our method effectively improves the reasoning capability of the model with our highly efficient labeling scheme for process supervision.
Researcher Affiliation Academia 1The University of Hong Kong 2The Chinese University of Hong Kong 3University of Cambridge 4University of Edinburgh 5Independent
Pseudocode No The paper describes methods textually and with diagrams (Figure 1) but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of AUTOPSV is available at https://github.com/rookie-joe/Auto PSV.
Open Datasets Yes For mathematical reasoning, we include GSM8K [21], containing math word problems requiring multi-step reasoning, and MATH [36], composed of high school-level competition problems covering a range of math subjects.
Dataset Splits No The paper utilizes several benchmarks (GSM8K, MATH, Hella Swag, Winogrande, ANLI) for evaluation and mentions using 'GSM8K training prompts' and 'GSM8K test set', but it does not explicitly provide the specific percentages or counts for training, validation, and test splits across all datasets used for the main experiments.
Hardware Specification Yes Our experiments were conducted using 8 NVIDIA A100 GPUs, each with 40GB of memory.
Software Dependencies No The paper mentions using Python's eval function and the Adam W optimizer, but it does not specify version numbers for Python, any programming libraries (e.g., PyTorch, TensorFlow), or other software dependencies.
Experiment Setup Yes Verifier Training Configuration For both process-supervised and outcome-supervised methods, we maintain consistent training parameters as detailed in Table 13. The training process spans 1 epoch with a batch size of 512 and a learning rate of 2 * 10^-6, incorporating a 3% learning rate warmup period. Table 13: Verifier Training Hyperparameters. Table 14: SFT Training Hyperparameters.