Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency
Authors: Zenan Li, Yifan Wu, Zhaoyu Li, Xinming Wei, Xian Zhang, Fan Yang, Xiaoxing Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on the MATH and mini F2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x relative improvements across various LLMs and baseline methods. |
| Researcher Affiliation | Collaboration | Zenan Li1 Yifan Wu2 Zhaoyu Li3 Xinming Wei2 Fan Yang4 Xian Zhang4 Xiaoxing Ma1 1State Key Lab of Novel Software Technology, Nanjing University, China 2Peking University, 3University of Toronto 4Microsoft Research Asia |
| Pseudocode | Yes | The overall procedure of our autoformalization framework is presented in Algorithm 1. |
| Open Source Code | Yes | The data and code are available at https://github.com/Miracle-Messi/Isa-AutoFormal |
| Open Datasets | Yes | We evaluate the proposed methods on the MATH [23] and mini F2F [25] datasets, both of which encompass a wide range of mathematical problems designed for different levels of complexity and abstraction. |
| Dataset Splits | Yes | where α [0, 1] is the hyperparameter controlling the trade-off between the symbolic equivalence and the semantic consistency, which practically can be tuned based on the validation set. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as particular CPU or GPU models, memory specifications, or compute cluster configurations. |
| Software Dependencies | No | The paper mentions key software components such as "scala-isabelle [73]", "Isabelle/HOL [74]", "Z3 [29]", "CVC5 [30]", and "BERT [68]" used in its methodology. However, it does not specify explicit version numbers for these software dependencies. |
| Experiment Setup | Yes | In addition, we employ few-shot prompting, and set the temperature of the generation process to 0.7 for all LLMs. ... In the experiments, the number of generations (k) is fixed at 10... Furthermore, we fix α = 0.5 based on the performance curve... |