reproducibilityindex.ai

Learning to Prove Theorems by Learning to Generate Theorems

Authors: Mingzhe Wang, Jia Deng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover and advances the state of the art of automated theorem proving in Metamath.
Researcher Affiliation	Academia	Mingzhe Wang Princeton University mingzhew@cs.princeton.edu Jia Deng Princeton University jiadeng@cs.princeton.edu
Pseudocode	Yes	We repeat this procedure to get a set of synthetic theorems (pseudo-code in the supplementary material).
Open Source Code	Yes	Code is available at https://github.com/princeton-vl/Meta Gen.
Open Datasets	Yes	Dataset We experiment on two Metamath knowledge bases: iset.mm and set.mm. iset.mm formalizes intuitionistic logic and contains 463 axioms and 8916 theorems, which give rise to 8916 corresponding proof tasks. ... We use the same version of set.mm as Whalen (2016). It formalizes the ZFC set theory and contains 1099 axioms and 27218 theorems... (Megill & Wheeler, 2019)
Dataset Splits	Yes	We randomly split all theorems into three disjoint sets: a training set, a validation set, and a test set. Accordingly, we have three corresponding sets of proof tasks using the theorems as targets. More details about this setup in the supplemental material. ...These proof tasks are divided into 21786 training tasks, 2712 validation tasks and 2720 test tasks.
Hardware Specification	No	Our GPU re-implementation of Holophrasm finds 557 proofs trained on 100% of human proofs, more than the number reported in Whalen (2016). We believe this is due to the fact that our prover runs faster on GPUs.
Software Dependencies	No	The paper mentions software components like Holophrasm and the Metamath language, and deep learning architectures (GRU, bilinear layer) and algorithms (Reinforce, GAN), but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	On iset.mm, we generate 1M unique synthetic theorems. On set.mm, we generate 300K unique theorems for the setting of 0% of human proofs (after discarding any duplicates) and 1M unique theorems for 10% of the human training proofs. We generate 10M theorems for the setting of 100% of human proofs, by generating 1M unique theorems a time (maximum allowed by memory limit) and repeating 10 times. ... (our prover has a time limit of 5 minutes for each run while their time limit is not mentioned).