Premise Selection for Theorem Proving by Deep Graph Embedding

Authors: Mingzhe Wang, Yihe Tang, Jian Wang, Jia Deng

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments using the Hol Step dataset [10], which consists of over two million conjecturestatement pairs that can be used to evaluate premise selection. The results show that our graphembedding approach achieves large improvement over sequence-based models. In particular, our approach improves the state-of-the-art accuracy on Hol Step by 7.3%.
Researcher Affiliation Academia Mingzhe Wang Yihe Tang Jian Wang Jia Deng University of Michigan, Ann Arbor
Pseudocode No The paper describes methods and equations but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluate our approach on the Hol Step dataset [10], a recently introduced benchmark for evaluating machine learning approaches for theorem proving. It was constructed from the proof trace files of the HOL Light theorem prover [7] on its multivariate analysis library [48] and the formal proof of the Kepler conjecture.
Dataset Splits Yes The dataset contains 11,410 conjectures, including 9,999 in the training set and 1,411 in the test set. There are 2,209,076 conjecture-statement pairs in total. We hold out 700 conjectures from the training set as the validation set to tune hyperparameters and perform ablation analysis.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions RMSProp as an optimizer but does not specify versions for software libraries or dependencies.
Experiment Setup Yes We train our networks using RMSProp [50] with 0.001 learning rate and 1 × 10−4 weight decay. We lower the learning rate by 3X after each epoch. We train all models for five epochs and all networks converge after about three or four epochs.