AutoPSV: Automated Process-Supervised Verifier
Authors: Jianqiao Lu, Zhiyang Dou, Hongru WANG, Zeyu Cao, Jianbo Dai, Yunlong Feng, Zhijiang Guo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments across five datasets, including mathematical reasoning benchmarks and commonsense reasoning tasks. The results demonstrate that our method effectively improves the reasoning capability of the model with our highly efficient labeling scheme for process supervision. |
| Researcher Affiliation | Academia | 1The University of Hong Kong 2The Chinese University of Hong Kong 3University of Cambridge 4University of Edinburgh 5Independent |
| Pseudocode | No | The paper describes methods textually and with diagrams (Figure 1) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of AUTOPSV is available at https://github.com/rookie-joe/Auto PSV. |
| Open Datasets | Yes | For mathematical reasoning, we include GSM8K [21], containing math word problems requiring multi-step reasoning, and MATH [36], composed of high school-level competition problems covering a range of math subjects. |
| Dataset Splits | No | The paper utilizes several benchmarks (GSM8K, MATH, Hella Swag, Winogrande, ANLI) for evaluation and mentions using 'GSM8K training prompts' and 'GSM8K test set', but it does not explicitly provide the specific percentages or counts for training, validation, and test splits across all datasets used for the main experiments. |
| Hardware Specification | Yes | Our experiments were conducted using 8 NVIDIA A100 GPUs, each with 40GB of memory. |
| Software Dependencies | No | The paper mentions using Python's eval function and the Adam W optimizer, but it does not specify version numbers for Python, any programming libraries (e.g., PyTorch, TensorFlow), or other software dependencies. |
| Experiment Setup | Yes | Verifier Training Configuration For both process-supervised and outcome-supervised methods, we maintain consistent training parameters as detailed in Table 13. The training process spans 1 epoch with a batch size of 512 and a learning rate of 2 * 10^-6, incorporating a 3% learning rate warmup period. Table 13: Verifier Training Hyperparameters. Table 14: SFT Training Hyperparameters. |