Learning to Prove Theorems by Learning to Generate Theorems
Authors: Mingzhe Wang, Jia Deng
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover and advances the state of the art of automated theorem proving in Metamath. |
| Researcher Affiliation | Academia | Mingzhe Wang Princeton University mingzhew@cs.princeton.edu Jia Deng Princeton University jiadeng@cs.princeton.edu |
| Pseudocode | Yes | We repeat this procedure to get a set of synthetic theorems (pseudo-code in the supplementary material). |
| Open Source Code | Yes | Code is available at https://github.com/princeton-vl/Meta Gen. |
| Open Datasets | Yes | Dataset We experiment on two Metamath knowledge bases: iset.mm and set.mm. iset.mm formalizes intuitionistic logic and contains 463 axioms and 8916 theorems, which give rise to 8916 corresponding proof tasks. ... We use the same version of set.mm as Whalen (2016). It formalizes the ZFC set theory and contains 1099 axioms and 27218 theorems... (Megill & Wheeler, 2019) |
| Dataset Splits | Yes | We randomly split all theorems into three disjoint sets: a training set, a validation set, and a test set. Accordingly, we have three corresponding sets of proof tasks using the theorems as targets. More details about this setup in the supplemental material. ...These proof tasks are divided into 21786 training tasks, 2712 validation tasks and 2720 test tasks. |
| Hardware Specification | No | Our GPU re-implementation of Holophrasm finds 557 proofs trained on 100% of human proofs, more than the number reported in Whalen (2016). We believe this is due to the fact that our prover runs faster on GPUs. |
| Software Dependencies | No | The paper mentions software components like Holophrasm and the Metamath language, and deep learning architectures (GRU, bilinear layer) and algorithms (Reinforce, GAN), but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | On iset.mm, we generate 1M unique synthetic theorems. On set.mm, we generate 300K unique theorems for the setting of 0% of human proofs (after discarding any duplicates) and 1M unique theorems for 10% of the human training proofs. We generate 10M theorems for the setting of 100% of human proofs, by generating 1M unique theorems a time (maximum allowed by memory limit) and repeating 10 times. ... (our prover has a time limit of 5 minutes for each run while their time limit is not mentioned). |