LEGO-Prover: Neural Theorem Proving with Growing Libraries
Authors: Haiming Wang, Huajian Xin, Chuanyang Zheng, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Xiaodan Liang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | LEGO-Prover advances the state-of-the-art pass rate on mini F2F-valid (48.0% to 57.0%) and mini F2F-test (45.5% to 50.0%). During the proving process, LEGO-Prover also generates over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in a 4.9% improvement in success rate. |
| Researcher Affiliation | Collaboration | 1Sun Yat-sen University 2Huawei Noah s Ark Lab 3The Chinese University of Hong Kong 4City University of Hong Kong 5MBZUAI 6Dark Matter AI Research |
| Pseudocode | Yes | As illustrated in Fig. 2 (a) and Algorithm 2, the prover employs a three-step process to construct the proof. ... Algorithm 3 also illustrates the overall process for the evolver. |
| Open Source Code | Yes | Code available at https://github.com/wiio12/LEGO-Prover. |
| Open Datasets | Yes | We conduct extensive experiments on the popular theorem-proving dataset mini F2F (Zheng et al., 2021). |
| Dataset Splits | Yes | The problems are divided into valid and test sets, with 244 problems each. |
| Hardware Specification | No | The paper mentions using "Chat GPT (GPT-3.5)" and "Open AI s text-davinci-ada embedding model," which implies using cloud-based LLM services, but does not specify any particular hardware like GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | Yes | A combination of gpt-3.5-turbo, gpt-3.5-turbo-0301, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k, and gpt-3.5turbo-16k-0613 is employed, with a model being selected randomly during calls to the Open AI API. ... Practically, Chroma DB serves as our vector store, coupled with the Open AI s text-davinci-ada embedding model. ... For interacting with the Isabelle theorem prover, we employ the PISA environment (Jiang et al., 2021). |
| Experiment Setup | Yes | each problem undergoes 100 attempts of proving using Chat GPT (GPT-3.5). ... The temperature is consistently set at T = 0.7 across all procedures. Within the prover, 3-shot examples are leveraged for the decomposer. Regarding the formalizer, the quantity of reference skills nf is set to 6 for the valid set and 4 for the test set, and paired with 2 formalization in-context examples. For the directional transformer, the number of reference problem statements nd is set to 4, supplemented by two directional transformation in-context examples. For the request solver, 3 skills are retrieved for forming the in-context demonstration examples. |