Neuro-Symbolic Data Generation for Math Reasoning

Authors: Zenan Li, Zhi Zhou, Yuan Yao, Xian Zhang, Yu-Feng Li, Chun Cao, Fan Yang, Xiaoxing Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments demonstrate the high quality of data generated by the proposed method, and that the LLMs, specifically LLa MA-2 and Mistral, when realigned with the generated data, surpass their state-of-the-art counterparts.
Researcher Affiliation Collaboration 1State Key Lab of Novel Software Technology, Nanjing University, China 2Microsoft Research Asia
Pseudocode Yes Algorithm 1 The overall framework of problem mutation
Open Source Code Yes We provide the code for our data generation framework, as well as a small part of our generated data, in the supplementary material.
Open Datasets Yes We conduct our data generation on the training sets of two popular mathematical reasoning benchmarks: GSM8K [10] and MATH [11].
Dataset Splits No The paper mentions training and testing data for GSM8K and MATH datasets (e.g., 'GSM8K ... contains 7,473 training data and 1,319 testing data'), but does not explicitly state the use or size of a separate validation split for model tuning.
Hardware Specification Yes In this paper, we fully fine-tune the LLAMA-2-7B and LLAMA-2-13B models using four H800 NVIDIA GPUs. ... The 70B model is fine-tuned using eight A800 NVIDIA GPUs.
Software Dependencies No The paper mentions several software components like Z3, CVC4, Math SAT, Sym Py, Sci Py, Py SMT framework, and SMT-LIB version 2.5, but does not provide specific version numbers for most of the key software dependencies.
Experiment Setup Yes Each model is trained for 3 epochs with a batch size of 128 and a learning rate of 2e-5. For the fine-tuning of the LLAMA-2-70B model, we adopt the QLo RA [65] method with a learning rate of 1e-4. The rank and alpha of Lo RA [66] are set to 96 and 16, respectively, with a dropout rate of 0.05 between the two matrices. The Lo RA modules are added to both the attention and MLP layers. Moreover, we adopt the instruction template Prompt 1 used in Alpaca [67] for fine-tuning each model.